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Molecular modelling 



Molecular modelling is a collective term that refers to 
theoretical methods and computational techniques to 
model or mimic the behaviour of molecules. The 
techniques are used in the fields of computational 
chemistry, computational biology and materials science 
for studying molecular systems ranging from small 
chemical systems to large biological molecules and 
material assemblies. The simplest calculations can be 
performed by hand, but inevitably computers are required 
to perform molecular modelling of any reasonably sized 
system. The common feature of molecular modelling 
techniques is the atomistic level description of the 
molecular systems; the lowest level of information is 
individual atoms (or a small group of atoms). This is in 
contrast to quantum chemistry (also known as electronic 
structure calculations) where electrons are considered 
explicitly. The benefit of molecular modelling is that it 
reduces the complexity of the system, allowing many more 
particles (atoms) to be considered during simulations. 

Molecular mechanics is one aspect of molecular 
modelling, as it is refers to the use of classical 
mechanics/Newtonian mechanics to describe the physical 
basis behind the models. Molecular models typically The backbone dihedral angles are 
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coordinates in Cartesian space or in internal 
coordinates, and can also be assigned velocities in 
dynamical simulations. The atomic velocities are related 
to the temperature of the system, a macroscopic 
quantity. The collective mathematical expression is 
known as a potential function and is related to the 
system internal energy (U), a thermodynamic quantity 
equal to the sum of potential and kinetic energies. 
Methods which minimize the potential energy are 
known as energy minimization techniques (e.g., 
steepest descent and conjugate gradient), while 
methods that model the behaviour of the system with 
propagation of time are known as molecular dynamics. 
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This function, referred to as a potential function, computes the molecular potential energy 
as a sum of energy terms that describe the deviation of bond lengths, bond angles and 
torsion angles away from equilibrium values, plus terms for non-bonded pairs of atoms 
describing van der Waals and electrostatic interactions. The set of parameters consisting of 
equilibrium bond lengths, bond angles, partial charge values, force constants and van der 
Waals parameters are collectively known as a force field. Different implementations of 
molecular mechanics use slightly different mathematical expressions, and therefore, 
different constants for the potential function. The common force fields in use today have 
been developed by using high level quantum calculations and/or fitting to experimental 
data. The technique known as energy minimization is used to find positions of zero gradient 
for all atoms, in other words, a local energy minimum. Lower energy states are more stable 
and are commonly investigated because of their role in chemical and biological processes. A 
molecular dynamics simulation, on the other hand, computes the behaviour of a system as a 
function of time. It involves solving Newton's laws of motion, principally the second law, F 
= ma. Integration of Newton's laws of motion, using different integration algorithms, leads 
to atomic trajectories in space and time. The force on an atom is defined as the negative 
gradient of the potential energy function. The energy minimization technique is useful for 
obtaining a static picture for comparing between states of similar systems, while molecular 
dynamics provides information about the dynamic processes with the intrinsic inclusion of 
temperature effects. 

Molecules can be modelled either in vacuum or in the presence of a solvent such as water. 
Simulations of systems in vacuum are referred to as gas-phase simulations, while those that 
include the presence of solvent molecules are referred to as explicit solvent simulations. In 
another type of simulation, the effect of solvent is estimated using an empirical 
mathematical expression; these are known as implicit solvation simulations. 

Molecular modelling methods are now routinely used to investigate the structure, dynamics 
and thermodynamics of inorganic, biological, and polymeric systems. The types of biological 
activity that have been investigated using molecular modelling include protein folding, 
enzyme catalysis, protein stability, conformational changes associated with biomolecular 
function, and molecular recognition of proteins, DNA, and membrane complexes. 
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Popular software for molecular modelling 

Abalone 

AMBER 

ADF 

Ascalaph Designer 

BALLView 

Biskit 

BOSS 

Cerius2 

Chimera 

CHARMM 

Coot (program) for X-ray crystallography of biological molecules 

COSMOS (software) [3] 

CP2K 

CPMD 

Firefly 

GAMESS (UK) 

GAMESS (US) 

GAUSSIAN 

Ghemical 

GROMACS 

GROMOS 

Insightll 

LAMMPS 

MacroModel 

MarvinSpace 

Materials Studio 

MDynaMix 

MMTK 

MOE (software) [5] 

Molecular Docking Server 

Molsoft ICM [6] 

MOPAC 

NAMD 

NOCH 

Oscail X 

PyMOL 

Q-Chem 

Sirius 

SPARTAN (software) [7] 

STR3DI32 [8] 

Sybyl (software) [9] 

MCCCS Towhee [10] 

TURBOMOLE 

ReaxFF 

VMD 

WHATIF [11] 
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• xeo [12] 

• YASARA [13] 

• Zodiac (software)'- ^ 

See also 

Cheminformatics 

Computational chemistry 

Density functional theory programs. 

Force field in Chemistry 

Force field implementation 

List of nucleic acid simulation software 

List of protein structure prediction software 

Molecular Design software 

Molecular dynamics 

Molecular graphics 

Molecular mechanics 

Molecular model 

Molecular modelling on GPU 

Molecule editor 

Monte Carlo method 

Quantum chemistry computer programs 

Semi-empirical quantum chemistry method 

Software for molecular mechanics modelling 

Structural Bioinformatics 

External links 

• Center for Molecular Modeling at the National Institutes of Health (NIH) [15] (U.S. 
Government Agency) 

• Molecular Simulation [16] , details for the Molecular Simulation journal ISSN: 0892-7022 

(print), 1029-0435 (online) 

ri7i 

• The eCheminfo Network and Community of Practice in Informatics and Modeling 
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Homepage 

[I] Agile Molecule (http://www.agilemolecule.com/index.html) 

[2] York Structural Biology Laboratory (http://www.ysbl.york.ac.uk/~emsley/coot/) 

[3] COSMOS (http://www.cosmos-software.de/ceJntro.html) - Computer Simulation of Molecular Structures 

[4] ChemAxon (http://www.chemaxon.com/product/mspace.html) 

[5] MOE - Molecular Operating Environment, Chemical Computing Group (http://www.chemcomp.com/) 

[6] Molsoft (http://www.molsoft.com/) 

[7] Wavefunction, Inc. (http://www.wavefun.com/) 

[8] Exorga, Inc. (http://www.exorga.com/) 

[9] Tipos (http://www.tripos.com/sybyl/) 

[10] MCCCS Towhee (http://towhee.sourceforge.net/) - Monte Carlo for Complex Chemical Systems 

[II] CMBI (http://swift.cmbi.ru.nl/whatif/) 
[12] xeo (http://sourceforge.net/projects/xeo) 
[13] YASARA (http://www.yasara.org/) 

[14] ZedeN (http://www.zeden.org) 

[15] http://cmm.info.nih.gov/modeling/ 

[16] http://www.tandf.co.uk/journals/titles/08927022.asp 

[17] http://www.echeminfo.com/ 

[18] http ://www. amrita. edu/cen/ccmm 
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Quantum chemistry is a branch of theoretical chemistry, which applies quantum 
mechanics and quantum field theory to address issues and problems in chemistry. The 
description of the electronic behavior of atoms and molecules as pertaining to their 
reactivity is one of the applications of quantum chemistry. Quantum chemistry lies on the 
border between chemistry and physics, and significant contributions have been made by 
scientists from both fields. It has a strong and active overlap with the field of atomic 
physics and molecular physics, as well as physical chemistry. 

Quantum chemistry mathematically describes the fundamental behavior of matter at the 

n l 
molecular scale. It is, in principle, possible to describe all chemical systems using this 

theory. In practice, only the simplest chemical systems may realistically be investigated in 

purely quantum mechanical terms, and approximations must be made for most practical 

purposes (e.g., Hartree-Fock, post Hartree-Fock or Density functional theory, see 

computational chemistry for more details). Hence a detailed understanding of quantum 

mechanics is not necessary for most chemistry, as the important implications of the theory 

(principally the orbital approximation) can be understood and applied in simpler terms. 

In quantum mechanics the Hamiltonian, or the physical state, of a particle can be expressed 
as the sum of two operators, one corresponding to kinetic energy and the other to potential 
energy. The Hamiltonian in the Schrodinger wave equation used in quantum chemistry does 
not contain terms for the spin of the electron. 

Solutions of the Schrodinger equation for the hydrogen atom gives the form of the wave 
function for atomic orbitals, and the relative energy of the various orbitals. The orbital 
approximation can be used to understand the other atoms e.g. helium, lithium and carbon. 
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History 

The history of quantum chemistry essentially began with the 1838 discovery of cathode 
rays by Michael Faraday, the 1859 statement of the black body radiation problem by Gustav 
Kirchhoff, the 1877 suggestion by Ludwig Boltzmann that the energy states of a physical 
system could be discrete, and the 1900 quantum hypothesis by Max Planck that any energy 
radiating atomic system can theoretically be divided into a number of discrete energy 
elements e such that each of these energy elements is proportional to the frequency v with 
which they each individually radiate energy, as defined by the following formula: 

€ = flV 

where h is a numerical value called Planck's Constant. Then, in 1905, to explain the 
photoelectric effect (1839), i.e., that shining light on certain materials can function to eject 
electrons from the material, Albert Einstein postulated, based on Planck's quantum 
hypothesis, that light itself consists of individual quantum particles, which later came to be 
called photons (1926). In the years to follow, this theoretical basis slowly began to be 
applied to chemical structure, reactivity, and bonding. 

Electronic structure 

The first step in solving a quantum chemical problem is usually solving the Schrodinger 
equation (or Dirac equation in relativistic quantum chemistry) with the electronic molecular 
Hamiltonian. This is called determining the electronic structure of the molecule. It can be 
said that the electronic structure of a molecule or crystal implies essentially its chemical 
properties. 

Wave model 

The foundation of quantum mechanics and quantum chemistry is the wave model, in which 
the atom is a small, dense, positively charged nucleus surrounded by electrons. Unlike the 
earlier Bohr model of the atom, however, the wave model describes electrons as "clouds" 
moving in orbitals, and their positions are represented by probability distributions rather 
than discrete points. The strength of this model lies in its predictive power. Specifically, it 
predicts the pattern of chemically similar elements found in the periodic table. The wave 
model is so named because electrons exhibit properties (such as interference) traditionally 
associated with waves. See wave-particle duality. 

Valence bond 

Although the mathematical basis of quantum chemistry had been laid by Schrodinger in 

1926, it is generally accepted that the first true calculation in quantum chemistry was that 
of the German physicists Walter Heitler and Fritz London on the hydrogen (H ) molecule in 

1927. Heitler and London's method was extended by the American theoretical physicist 
John C. Slater and the American theoretical chemist Linus Pauling to become the 
Valence-Bond (VB) [or Heitler-London-Slater-Pauling (HLSP)] method. In this 
method, attention is primarily devoted to the pairwise interactions between atoms, and this 
method therefore correlates closely with classical chemists' drawings of bonds. 
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Molecular orbital 

An alternative approach was developed in 1929 by Friedrich Hund and Robert S. Mulliken, 
in which electrons are described by mathematical functions delocalized over an entire 
molecule. The Hund-Mulliken approach or molecular orbital (MO) method is less 
intuitive to chemists, but has turned out capable of predicting spectroscopic properties 
better than the VB method. This approach is the conceptional basis of the Hartree-Fock 
method and further post Hartree-Fock methods. 

Density functional theory 

The Thomas-Fermi model was developed independently by Thomas and Fermi in 1927. 
This was the first attempt to describe many-electron systems on the basis of electronic 
density instead of wave functions, although it was not very successful in the treatment of 
entire molecules. The method did provide the basis for what is now known as density 
functional theory. Though this method is less developed than post Hartree-Fock methods, 
its lower computational requirements allow it to tackle larger polyatomic molecules and 
even macromolecules, which has made it the most used method in computational chemistry 
at present. 

Chemical dynamics 

A further step can consist of solving the Schrodinger equation with the total molecular 
Hamiltonian in order to study the motion of molecules. Direct solution of the Schrodinger 
equation is called quantum molecular dynamics, within the semiclassical approximation 
semiclassical molecular dynamics, and within the classical mechanics framework molecular 
dynamics (MD). Statistical approaches, using for example Monte Carlo methods, are also 
possible. 

Adiabatic chemical dynamics 

Main article: Adiabatic formalism or Born-Oppenheimer approximation 

In adiabatic dynamics, interatomic interactions are represented by single scalar 
potentials called potential energy surfaces. This is the Born-Oppenheimer approximation 
introduced by Born and Oppenheimer in 1927. Pioneering applications of this in chemistry 
were performed by Rice and Ramsperger in 1927 and Kassel in 1928, and generalized into 
the RRKM theory in 1952 by Marcus who took the transition state theory developed by 
Eyring in 1935 into account. These methods enable simple estimates of unimolecular 
reaction rates from a few characteristics of the potential surface. 

Non-adiabatic chemical dynamics 

Non-adiabatic dynamics consists of taking the interaction between several coupled 
potential energy surface (corresponding to different electronic quantum states of the 
molecule). The coupling terms are called vibronic couplings. The pioneering work in this 
field was done by Stueckelberg, Landau, and Zener in the 1930s, in their work on what is 
now known as the Landau-Zener transition. Their formula allows the transition probability 
between two diabatic potential curves in the neighborhood of an avoided crossing to be 
calculated. 
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Quantum chemistry and quantum field theory 

The application of quantum field theory (QFT) to chemical systems and theories has become 
increasingly common in the modern physical sciences. One of the first and most 
fundamentally explicit appearances of this is seen in the theory of the photomagneton. In 
this system, plasmas, which are ubiquitous in both physics and chemistry, are studied in 
order to determine the basic quantization of the underlying bosonic field. However, 
quantum field theory is of interest in many fields of chemistry, including: nuclear chemistry, 
astrochemistry, sonochemistry, and quantum hydrodynamics. Field theoretic methods have 
also been critical in developing the ab initio Effective Hamiltonian theory of semi-empirical 
pi-electron methods. 

See also 

Atomic physics 

Computational chemistry 

Condensed matter physics 

International Academy of Quantum Molecular Science 

Physical chemistry 

Quantum chemistry computer programs 

Quantum electrochemistry 

QMC@Home 

Theoretical physics 

Further reading 

• Pauling, L. (1954). General Chemistry. Dover Publications. ISBN 0-486-65622-5. 

• Pauling, L., and Wilson, E. B. Introduction to Quantum Mechanics with Applications to 
Chemistry (Dover Publications) ISBN 0-486-64871-0 

• Atkins, P.W. Physical Chemistry (Oxford University Press) ISBN 0-19-879285-9 

• McWeeny, R. Coulson's Valence (Oxford Science Publications) ISBN 0-19-855144-4 

• Landau, L.D. and Lifshitz, E.M. Quantum Mechanics-.Non-relativistic Theory (Course of 
Theoretical Physics vol.3) (Pergamon Press) 

• Bernard Pullman and Alberte Pullman. 1963. Quantum Biochemistry., New York and 
London: Academic Press. 

• Eric R. Scerri, The Periodic Table: Its Story and Its Significance, Oxford University Press, 
2006. Considers the extent to which chemistry and especially the periodic system has 
been reduced to quantum mechanics. ISBN 0-19-530573-6. 

• Simon, Z. 1976. Quantum Biochemistry and Specific Interactions., Taylor & Francis; 
ISBN-13: 978-0856260872 and ISBN 0-85-6260878 . 
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External links 

• The Sherrill Group - Notes (http://vergil.chemistry.gatech.edu/notes/index.html) 

• ChemViz Curriculum Support Resources (http://www.shodor.org/chemviz/) 

• Early ideas in the history of quantum chemistry (http://www. 
quantum-chemistry-history, com/) 

Nobel lectures by quantum chemists 

• Walter Kohn's Nobel lecture (http://nobelprize.org/chemistry/laureates/1998/ 
kohn-lecture.html) 

• Rudolph Marcus' Nobel lecture (http://nobelprize.org/chemistry/laureates/1992/ 
marcus-lecture . html) 

• Robert Mulliken's Nobel lecture (http://nobelprize.org/chemistry/laureates/1966/ 
mulliken-lecture.html) 

• Linus Pauling's Nobel lecture (http://nobelprize.org/chemistry/laureates/1954/ 
pauling-lecture.html) 

• John Pople's Nobel lecture (http://nobelprize.org/chemistry/laureates/1998/ 
pople-lecture.html) 

Molecular orbital theory 

In chemistry, molecular orbital theory (MO theory) is a method for determining 
molecular structure in which electrons are not assigned to individual bonds between atoms, 
but are treated as moving under the influence of the nuclei in the whole molecule. In this 
theory, each molecule has a set of molecular orbitals, in which it is assumed that the 
molecular orbital wave function \y may be written as a simple weighted sum of the n 
constituent atomic orbitals j., according to the following equation: 1 ' 

1=1 

The c coefficients may be determined numerically by substitution of this equation into the 
Schrodinger equation and application of the variational principle. This method is called the 
linear combination of atomic orbitals approximation and is used in computational 
chemistry. An additional unitary transformation can be applied on the system to accelerate 
the convergence in some computational schemes. Molecular orbital theory was seen as a 
competitor to valence bond theory in the 1930s, before it was realized that the two methods 
are closely related and that when extended they become equivalent. 
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History 

Molecular orbital theory was developed, in the years after valence bond theory (1927) had 
been established, primarily through the efforts of Friedrich Hund, Robert Mulliken, John C. 
Slater, and John Lennard -Jones. MO theory was originally called the Hund-Mulliken 
theory. The word orbital was introduced by Mulliken in 1932. By 1933, the molecular 
orbital theory had become accepted as a valid and useful theory. According to German 
physicist and physical chemist Erich Hiickel, the first quantitative use of molecular orbital 
theory was the 1929 paper of Lennard -Jones. The first accurate calculation of a molecular 
orbital wavefunction was that made by Charles Coulson in 1938 on the hydrogen 
molecule. By 1950, molecular orbitals were completely defined as eigenfunctions (wave 
functions) of the self-consistent field Hamiltonian and it was at this point that molecular 
orbital theory became fully rigorous and consistent. This rigorous approach is known as 
the Hartree-Fock method for molecules although it had its origins in calculations on atoms. 
In calculations on molecules, the molecular orbitals are expanded in terms of an atomic 
orbital basis set, leading to the Roothaan equations. This led to the development of many 
ab initio quantum chemistry methods. Parallel to this rigorous development, molecular 
orbital theory was applied in an approximate manner using some empirically derived 
parameters in methods now known as semi-empirical quantum chemistry methods. 

Overview 

Molecular orbital (MO) theory uses a linear combination of atomic orbitals to form 
molecular orbitals which cover the whole molecule. These are often divided into bonding 
orbitals, anti-bonding orbitals, and non-bonding orbitals. A molecular orbital is merely a 
Schrodinger orbital which includes several, but often only two nuclei. If this orbital is of 
type in which the electron(s) in the orbital have a higher probability of being between 
nuclei than elsewhere, the orbital will be a bonding orbital, and will tend to hold the nuclei 
together. If the electrons tend to be present in a molecular orbital in which they spend 
more time elsewhere than between the nuclei, the orbital will function as an anti-bonding 
orbital and will actually weaken the bond. Electrons in non-bonding orbitals tend to be in 
deep orbitals (nearly atomic orbitals) associated almost entirely with one nucleus or the 
other, and thus they spend equal time between nuclei or not. These electrons neither 
contribute nor detract from bond strength. 

Molecular orbitals are further divided according to the types of atomic orbitals combining 
to form a bond. These orbitals are results of electron-nucleus interactions that are caused 
by the fundamental force of electromagnetism. Chemical substances will form a bond if 
their orbitals become lower in energy when they interact with each other. Different 
chemical bonds are distinguished that differ by electron cloud shape and by energy levels. 

MO theory provides a global, delocalized perspective on chemical bonding. For example, in 
the MO theory for hypervalent molecules it is unnecessary to invoke a major role for 
d-orbitals, whereas valence bond theory normally uses hybridization with d-orbitals to 
explain hypervalency. In MO theory, any electron in a molecule may be found anywhere in 
the molecule, since quantum conditions allow electrons to travel under the influence of an 
arbitrarily large number of nuclei, so long as permitted by certain quantum rules. Although 
in MO theory some molecular orbitals may hold electrons which are more localized between 
specific pairs of molecular atoms, other orbitals may hold electrons which are spread more 
uniformly over the molecule. Thus, overall, bonding (and electrons) are far more delocalized 
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(spread out) in MO theory, than is implied in valence bond (VB) theory. This makes MO 
theory more useful for the description of extended systems. 

An example is that in the MO picture of benzene, composed of a hexagonal ring of 6 carbon 
atoms. In this molecule, 24 of the 30 total valence bonding electrons are located in 12 o 
(sigma) bonding orbitals which are mostly located between pairs of atoms (C-C or C-H), 
similar to the valence bond picture. However, in benzene the remaining 6 bonding electrons 
are located in 3 n (pi) molecular bonding orbitals that are delocalized around the ring. Two 
are in an MO which has equal contributions from all 6 atoms. The other two orbitals have 
vertical nodes at right angles to each other. As in the VB theory, all of these 6 delocalized pi 
electrons reside in a larger space which exists above and below the ring plane. All 
carbon-carbon bonds in benzene are chemically equivalent. In MO theory this is a direct 
consequence of the fact that the 3 molecular pi orbitals form a combination which evenly 
spreads the extra 6 electrons over 6 carbon atoms. 

In molecules such as methane, the 8 valence electrons are found in 4 MOs that are spread 
out over all 5 atoms. However, it is possible to approximate the MOs with 4 localized 
orbitals similar in shape to sp hybrid orbitals predicted by VB theory. This is often 
adequate for o (sigma) bonds, but it is not possible for the n (pi) orbitals. However, the 
delocalized MO picture is more appropriate for ionization and spectroscopic predictions. 
Upon ionization of methane, a single electron is taken from the MO which surrounds the 
whole molecule, weakening all 4 bonds equally. VB theory would predict that one electron 
is removed for an sp orbital, resulting in the need for resonance between four valence 
bond structures, each of which has a one-electron bond. 

As in benzene, in substances such as beta carotene, chlorophyll or heme, some electrons 
the n (pi) orbitals are spread out in molecular orbitals over long distances in a molecule, 
giving rise to light absorption in lower energies (visible colors), a fact which is observed. 
This and other spectroscopic data for molecules are better explained in MO theory, with an 
emphasis on electronic states associated with multicenter orbitals, including mixing of 
orbitals premised on principles of orbital symmetry matching. The same MO principles also 
more naturally explain some electrical phenomena, such as high electrical conductivity in 
the planar direction of the hexagonal atomic sheets that exist in graphite. In MO theory, 
"resonance" (a mixing and blending of VB bond states) is a natural consequence of 
symmetry. For example, in graphite, as in benzene, it is not necessary to invoke the sp 
hybridization and resonance of VB theory, in order to explain electrical conduction. Instead, 
MO theory simply recognizes that some electrons in the graphite atomic sheets are 
completely delocalized over arbitrary distances, and reside in very large molecular orbitals 
that cover an entire graphite sheet, and some electrons are thus as free to move and 
conduct electricity in the sheet plane, as if they resided in a metal. 
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• Ab initio quantum chemistry methods 

• Atomic orbital 

• Configuration interaction 

• Coupled cluster 

• Hartree-Fock 



• Molecular orbital 

• MO diagram 

• Moller-Plesset perturbation theory 

• Quantum chemistry computer programs 

• Semi-empirical quantum chemistry methods 
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Linear combination of atomic orbitals 
molecular orbital method 

Electronic structure methods 

Tight binding 

Nearly-free electron model 

Hartree-Fock 

Modern valence bond 

Generalized valence bond 

Moller-Plesset perturbation theory 

Configuration interaction 

Coupled cluster 

Multi-configurational self-consistent field 

Density functional theory 

Quantum chemistry composite methods 

Quantum Monte Carlo 

kp perturbation theory 

Muffin-tin approximation 

LCAO method 

A linear combination of atomic orbitals or LCAO is a quantum superposition of atomic 
orbitals and a technique for calculatinq molecular orbitals in quantum chemistry. In 
quantum mechanics, electron confiqurations of atoms are described as wavefunctions. In 
mathematical sense, these wave functions are the basis set of functions, the basis functions, 
which describe the electrons of a qiven atom. In chemical reactions, orbital wavefunctions 
are modified, i.e. the electron cloud shape is chanqed, accordinq to the type of atoms 
participatinq in the chemical bond. 

It was introduced in 1929 by Sir John Lennard-Jones with the description of bondinq in the 
diatomic molecules of the first main row of the periodic table, but had been used earlier by 
Linus Paulinq for H 2 + . [2] [3] 

A mathematical description is 

& = CiXl + C 2X2 + csXs H r c„x„ 

or 

T 

where 0i (phi) is a molecular orbital represented as the sum of n atomic orbitals Xr(chi), 
each multiplied by a correspondinq coefficient c r . The coefficients are the weiqhts of the 
contributions of the n atomic orbitals to the molecular orbital. The Hartree-Fock procedure 
is used to obtain the coefficients of the expansion from the Hartree-Fock procedure. 
The orbitals are thus expressed as linear combinations of basis functions, and the basis 
functions are one-electron functions centered on nuclei of the component atoms of the 
molecule. The atomic orbitals used are typically those of hydroqen-like atoms since these 
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are known analytically i.e. Slater-type orbitals but other choices are possible like Gaussian 
functions from standard basis sets. 

By minimizing the total energy of the system, an appropriate set of coefficients of the linear 
combinations is determined. This quantitative approach is now known as the Hartree-Fock 
method. However, since the development of computational chemistry, the LCAO method 
often refers not to an actual optimization of the wave function but to a qualitative 
discussion which is very useful for predicting and rationalizing results obtained via more 
modern methods. In this case, the shape of the molecular orbitals and their respective 
energies are deduced approximately from comparing the energies of the atomic orbitals of 
the individual atoms (or molecular fragments) and applying some recipes known as level 
repulsion and the like. The graphs that are plotted to make this discussion clearer are 
called correlation diagrams. The required atomic orbital energies can come from 
calculations or directly from experiment via Koopmans' theorem. 

This is done by using the symmetry of the molecules and orbitals involved in bonding. The 
first step in this process is assigning a point group to the molecule. A common example is 
water, which is of C symmetry. Then a reducible representation of the bonding is 
determined demonstrated below for water: 



H 



H 



C: v 



Vu 



E C2 a v (xz) a v '(yz) 



2 



ro = A) +■ B 2 

Each operation in the point group is performed upon the molecule. The number of bonds 
that are unmoved is the character of that operation. This reducible representation is 
decomposed into the sum of irreducible representations. These irreducible representations 
correspond to the symmetry of the orbitals involved. 

MO diagrams provide simple qualitative LCAO treatment. 



4 



± 



**■ 



Quantitative theories are the Huckel method, the extended Huckel method and the 
Pariser-Parr-Pople method. 
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See also 

• Quantum chemistry computer programs 

• Hartree-Fock 

• Basis set (chemistry) 

• Tight binding 

External links 

• LCAO @ chemistry.umeche.maine.edu Link [ ] 
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The Hiickel method or Hiickel molecular orbital method (HMO) proposed by Erich 
Hiickel in 1930, is a very simple linear combination of atomic orbitals molecular orbitals 

(LCAO MO) method for the determination of energies of molecular orbitals of pi electrons in 

rn r2i 
conjugated hydrocarbon systems, such as ethene, benzene and butadiene. It is the 

theoretical basis for the Hiickel's rule; the extended Hiickel method developed by Roald 
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Hoffmann is the basis of the Woodward-Hoffmann rules .It was later extended to 
conjugated molecules such as pyridine, pyrrole and furan that contain atoms other than 
carbon, known in this context as heteroatoms. 

It is a very powerful educational tool and details appear in many chemistry textbooks. 

Hiickel characteristics 

The method has several characteristics: 

• It limits itself to conjugated hydrocarbons 

• Only pi electron MO's are included because these determine the general properties of 
these molecules and the sigma electrons are ignored. This is referred to as sigma-pi 
separability. 

• The method takes as inputs the LCAO MO Method, the Schrodinger equation and 
simplifications based on orbital symmetry considerations. Interestingly the method does 
not take in any physical constants. 

• The method predicts how many energy levels exist for a given molecule, which levels are 
degenerate and it expresses the MO energies as the sum of two other energy terms 
called alpha, the energy of an electron in a 2p-orbital and beta, an interaction energy 
between two p orbitals which are still unknown but importantly have become 
independent of the molecule. In addition it enables calculation of charge density for each 
atom in the pi framework, the bond order between any two atoms and the overall 
molecular dipole moment. 
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Hiickel results 

The results for a few simple molecules are tabulated below: 



Molecule 


Energy 


Frontier orbital 


HOMO 
gap 


LUMO energy 


Ethylene 


Ej-a-P 


LUMO 


-2p 




E 2 = a + p 


HOMO 




Butadiene 


E = a + 1.62P 








E 2 = a + 0.62P 


HOMO 


-1.24P 




E 3 = a-0.62p 


LUMO 






E 4 = a- 1.62P 






Benzene 


E = a + 2P 








E 2 = a + p 








E 3 =a + p 


HOMO 


-2p 




E 4 = « - P 


LUMO 






E 5 =a-p 








E 6 =a-2p 






Cyclobutadiene 


E = a + 2P 








E 2 =a 


SOMO 







E 3 = a 


SOMO 






E 4 =a-2p 






Table 1 . Hiickel method results L 


owest energies op top a an( 


[5] 
I p are both negative values 





The theory predicts two energy levels for ethylene with its two pi electrons filling the 
low-energy HOMO and the high energy LUMO remaining empty. In butadiene the 4 pi 
electrons occupy 2 low energy MO's out of a total of 4 and for benzene 6 energy levels are 
predicted two of them degenerate. 

For linear and cyclic systems (with n atoms), general solutions exist . 

_ _ kn 

Linear: E k = a + 28 cos 



Cyclic: Ejt = Of + 2/3 cos 



(fi + 1) 
2for 



it 



Many predictions have been experimentally verified: 

• The HOMO - LUMO gap in terms of the |3 constant correlates directly with the respective 
molecular electronic transitions observed with UV/VIS spectroscopy. For linear polyenes 
the energy gap is given as: 



AE 



-A3 sin 



2(71 + 1) 



from which a value for |3 can be obtained between -60 and -70 kcal/mol (-250 to 
-290 kJ/mol). [7] 

• The predicted MO energies as stipulated by Koopmans' theorem correlate with 



photoelectron spectroscopy. 



[8] 
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• The Hiickel derealization energy correlates with the experimental heat of combustion. 
This energy is defined as the difference between the total predicted pi energy (in 
benzene 8B) and a hypothetical pi energy in which all ethylene units are assumed isolated 
each contributing 2 (3 (making benzene 3 x 2 (3 = 6B). 

• Molecules with MO's paired up such that only the sign differs (for example a+/-|3) are 
called alternant hydrocarbons and have in common small molecular dipole moments. 
This is in contrast to non-alternant hydrocarbons such as azulene and fulvene that have 
large dipole moments. The Hiickel-theory is more accurate for alternant hydrocarbons. 

• For cyclobutadiene the theory predicts that the two high-energy electrons occupy a 
degenerate pair of MO's that are neither stabilized or destabilized. Hence the square 
molecule would be a very reactive triplet diradical (the ground state is actually 
rectangular without degenerate orbitals). In fact, all cyclic conjugated hydrocarbons with 
a total of An pi electrons share this MO pattern and this form the basis of Huckel's rule. 

Mathematics behind the Hiickel method 

The Hiickel method can be derived from the Ritz method with a few further assumptions 
concerning the overlap matrix S and the Hamiltonian matrix H. 

It is assumed that the overlap matrix S is the identity matrix. This means that overlap 
between the orbitals is neglected and the orbitals are considered orthogonal. Then the 
generalised eigenvalue problem of the Ritz method turns into an eigenvalue problem. 

The Hamiltonian matrix H = (H ..) is parametrised in the following way: 

H.. = a for C atoms and a + h A B for other atoms A. 

ii A K 

H.. = (3 if the two atoms are next to each other and both C, and k B for other neighbouring 
atoms A and B. 

H.. = in any other case 

The orbitals are the eigenvectors and the energies are the eigenvalues of the Hamiltonian 
matrix. If the substance is a pure hydrocarbon the problem can be solved without any 
knowledge about the parameters. For heteroatom systems, such as pyridine, values of h 



and k have to be specified. 



A 



Hiickel solution for ethylene 

In the Hiickel treatment for ethylene , the molecular orbital \I'is a linear combination of 
the 2p atomic orbitals at carbon with their ratio's c : 

This equation is substituted in the Schrodinger equation: 

m> = E$ 

with H the Hamiltonian and E the energy corresponding to the molecular orbital 
to give: 

i?ci0! + Hc 2 <fe = Ecifa + Eczfo 

This equation is multiplied by 0iand integrated to give the equation: 

Cl (H u - ES n ) + c 2 (H 12 - ES 12 ) = 
The same equation is multiplied by 0-zand integrated to give the equation: 
Cl (H 21 - ES 12 ) + c 2 {H 22 - ES 22 ) = 
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where: 






All diagonal Hamiltonian integrals Ha are called coulomb integrals and those of type Hij 
, where atoms i and j are connected, are called resonance integrals with these 
relationships: 

H[[ = H-2-2 = Q 

i?ia = H<n = 3 
Other assumptions are that the overlap integral between the two atomic orbitals is 

s±i = S22 = i 
s 12 = o 

leading to these two homogeneous eguations: 

Cl {a - E) + c 2 j5 = Q 

ci/3 + c 2 {a - E) = Q 
with a total of five variables. After converting this set to matrix notation: 



a — E 

& 



a. 



X 







the trivial solution gives both wavefunction coefficients c equal to zero which is not useful 
so the other (non-trivial) solution is : 



i) 



a-E & 

i3 a- ;: 

which can be solved by expanding its determinant: 

v2 o2 



or 



and 



(a - Ef = ,3 2 

a-E = ±3 

E = a±3 







After normalization the coefficients are obtained: 



c-i =c 2 



1 

71' 



The constant |3 in the energy term is negative and therefore a + |3 is the lower energy 
corresponding to the HOMO and is a - |3 the LUMO energy. 
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External links 

• Hiickel method @ chem.swin.edu.au Link L ' 

Further reading 

• The HMO-Model and its applications: Basis and Manipulation, E. Heilbronner and H. 
Bock, English translation, 1976, Verlag Chemie. 

• The HMO-Model and its applications: Problems with Solutions, E. Heilbronner and H. 
Bock, English translation, 1976, Verlag Chemie. 

• The HMO-Model and its applications: Tables of Hiickel Molecular Orbitals , E. 
Heilbronner and H. Bock, English translation, 1976, Verlag Chemie. 
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Extended Huckel method 

The extended Huckel method is a semiempirical quantum chemistry method, developed 
by Roald Hoffmann since 1963. It is based on the Huckel method but, while the original 
Huckel method only considers pi orbitals, the extended method also includes the sigma 
orbitals. 

The extended Huckel method can be used for determining the molecular orbitals, but it is 
not very successful in determining the structural geometry of an organic molecule. It can 
however determine the relative energy of different geometrical configurations. It involves 
calculations of the electronic interactions in a rather simple way where the 
electron-electron repulsions are not explicitly included and the total energy is just a sum of 
terms for each electron in the molecule. The off-diagonal Hamiltonian matrix elements are 
given by an approximation due to Wolfsberg and Helmholz that relates them to the diagonal 
elements and the overlap matrix element. 

H. = KS..(H. + H..)/2 

y y ii jj 

K is the Wolfsberg-Helmholtz constant, and is usually given a value of 1.75. In the extended 
Huckel method, only valence electrons are considered; the core electron energies and 
functions are supposed to be more or less constant between atoms of the same type. The 
method uses a series of parametrized energies calculated from atomic ionization potentials 
or theoretical methods to fill the diagonal of the Fock matrix. After filling the non-diagonal 
elements and diagonalizing the resulting Fock matrix, the energies (eigenvalues) and 
wavefunctions (eigenvectors) of the valence orbitals are found. 
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It is common in many theoretical studies to use the extended Huckel molecular orbitals as a 
preliminary step to determining the molecular orbitals by a more sophisticated method 
such as the CNDO/2 method and ab initio quantum chemistry methods. Since the EHT basis 
set is fixed, the monoparticle calculated wavefunctions must be projected to the basis set 
where the accurate calculation is to be done. One usually does this by adjusting the orbitals 
in the new basis to the old ones by least squares method. As only valence electron 
wavefunctions are found by this method, one must fill the core electron functions by 
orthonormalizing the rest of the basis set with the calculated orbitals and then selecting the 
ones with less energy. This leads to the determination of more accurate structures and 
electronic properties, or in the case of ab initio methods, to somewhat faster convergence. 

The method was first used by Roald Hoffmann who developed, with Robert Burns 
Woodward, rules for elucidating reaction mechanisms (the Woodward-Hoffmann rules). He 
used pictures of the molecular orbitals from extended Huckel theory to work out the orbital 
interactions in these cycloaddition reactions. 

A closely similar method was used earlier by Hoffmann and William Lipscomb for studies of 
boron hydrides. The off-diagonal Hamiltonian matrix elements were given as 

proportional to the overlap integral. 

H. = KS... 

y y 

This simplification of the Wolfsberg and Helmholz approximation is reasonable for boron 
hydrides as the diagonal elements are reasonably similar due to the small difference in 
electronegativity between boron and hydrogen. 

The method works poorly for molecules that contain atoms of very different 
electronegativity. To overcome this weakness, several groups have suggested iterative 
schemes that depend on the atomic charge. One such method, that is still widely used in 
inorganic and organometallic chemistry is the Fenske-Hall method. 

A recent program for the extended Huckel method is YAeHMOP which stands for "yet 
another extended Huckel molecular orbital package". 
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Molecular graphics 



Molecular graphics (MG) is the discipline and philosophy of studying molecules and their 

Ml 

properties through graphical representation. 1 J IUPAC limits the definition to 
representations on a "graphical display device". Ever since Dalton's atoms and Kekule's 
benzene, there has been a rich history of hand-drawn atoms and molecules, and these 
representations have had an important influence on modern molecular graphics. This 
article concentrates on the use of computers to create molecular graphics. Note, however, 
that many molecular graphics programs and systems have close coupling between the 
graphics and editing commands or calculations such as in molecular modelling. 



Relation to molecular models 

There has been a long tradition of creating 
molecular models from physical materials. 
Perhaps the best known is Crick and 
Watson's model of DNA built from rods and 
planar sheets, but the most widely used 
approach is to represent all atoms and 
bonds explicitly using the "ball and stick" 
approach. This can demonstrate a wide 
range of properties, such as shape, relative 
size, and flexibility. Many chemistry 
courses expect that students will have 
access to ball and stick models. One goal of 
mainstream molecular graphics has been to 
represent the "ball and stick" model as 
realistically as possible and to couple this 
with calculations of molecular properties. 

Figure 1 shows a small molecule (NH 3 CH 2 CH 2 C(OH)(P0 3 H)(P0 3 H)-), as drawn by the Jmol 
program. It is important to realise that the colours are purely a convention. Molecules can 
never be visible under any light microscope and atoms are not coloured, do not have hard 
surfaces and do not reflect light. Bonds are not rod-shaped. If physical molecular models 
had not existed, it is unlikely that molecular graphics would currently use this metaphor. 




Fig. 1. Key: Hydrogen = white, carbon = grey, 

nitrogen = blue, oxygen = red, and phosphorus : 

orange. 



Comparison of physical models with molecular graphics 

Physical models and computer models have partially complementary strengths and 
weaknesses. Physical models can be used by those without access to a computer and now 
can be made cheaply out of plastic materials. Their tactile and visual aspects cannot be 
easily reproduced by computers (although haptic devices have occasionally been built). On 
a computer screen, the flexibility of molecules is also difficult to appreciate; illustrating the 



Molecular graphics 23 

pseudorotation of cyclohexane is a good example of the value of mechanical models. 

However, it is difficult to build large physical molecules, and all-atom physical models of 
even simple proteins could take weeks or months to build. Moreover, physical models are 
not robust and they decay over time. Molecular graphics is particularly valuable for 
representing global and local properties of molecules, such as electrostatic potential. 
Graphics can also be animated to represent molecular processes and chemical reactions, a 
feat that is not easy to reproduce physically. 

History 

Initially the rendering was on early CRT screens or through plotters drawing on paper. 
Molecular structures have always been an attractive choice for developing new computer 
graphics tools, since the input data are easy to create and the results are usually highly 
appealing. The first example of MG was a display of a protein molecule (Project MAC, 1966) 
by Cyrus Levin thai and Robert Langridge. Among the milestones in high-performance MG 
was the work of Nelson Max in "realistic" rendering of macromolecules using reflecting 
spheres. 

By about 1980 many laboratories both in academia and industry had recognized the power 
of the computer to analyse and predict the properties of molecules, especially in materials 
science and the pharmaceutical industry. The discipline was often called "molecular 
graphics" and in 1982 a group of academics and industrialists in the UK set up the 
Molecular Graphics Society (MGS). Initially much of the technology concentrated either on 
high-performance 3D graphics, including interactive rotation or 3D rendering of atoms as 
spheres (sometimes with radiosity). During the 1980s a number of programs for calculating 
molecular properties (such as molecular dynamics and quantum mechanics) became 
available and the term "molecular graphics" often included these. As a result the MGS has 
now changed its name to the Molecular Graphics and Modelling Society (MGMS). 

The requirements of macromolecular crystallography also drove MG because the traditional 
techniques of physical model-building could not scale. Alwyn Jones' FRODO program (and 
later "O") were developed to overlay the molecular electron density determined from X-ray 
crystallography and the hypothetical molecular structure. 
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Art, science and technology in molecular graphics 

Both computer technology and graphic arts have 
contributed to molecular graphics. The development 
of structural biology in the 1950s led to a 
requirement to display molecules with thousands of 
atoms. The existing computer technology was 
limited in power, and in any case a naive depiction 
of all atoms left viewers overwhelmed. Most systems 
therefore used conventions where information was 
implicit or stylistic. Two vectors meeting at a point 
implied an atom or (in macromolecules) a complete 
residue (10-20 atoms). 

The macromolecular approach was popularized by 
Dickerson and Geis' presentation of proteins and the 
graphic work of Jane Richardson through 
high-quality hand-drawn diagrams such as the 
"ribbon" representation. In this they strove to 
capture the intrinsic 'meaning' of the molecule. This 
search for the "messages in the molecule" has 
always accompanied the increasing power of 
computer graphics processing. Typically the 
depiction would concentrate on specific areas of the 
molecule (such as the active site) and this might 
have different colours or more detail in the number 
of explicit atoms or the type of depiction (e.g., 
spheres for atoms). 




Fig. 2. Image of hemagglutinin with alpha 

helices depicted as cylinders and the rest 

of the chain as silver coils. The individual 

protein atoms (several thousand) have 

been hidden. All of the non-hydrogen atoms 

in the two ligands (presumably sialic acid) 

have been shown near the top of the 

diagram. Key: Carbon = grey, oxygen = 

red, nitrogen = blue. 



In some cases the limitations of technology have led 
to serendipitous methods for rendering. Most early graphics devices used vector graphics, 
which meant that rendering spheres and surfaces was impossible. Michael Connolly's 
program "MS" calculated points on the surface-accessible surface of a molecule, and the 
points were rendered as dots with good visibility using the new vector graphics technology, 
such as the Evans and Sutherland PS300 series. Thin sections ("slabs") through the 
structural display showed very clearly the complementarity of the surfaces for molecules 
binding to active sites, and the "Connolly surface" became a universal metaphor. 

The relationship between the art and science of molecular graphics is shown in the 
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exhibitions sponsored by the Molecular Graphics Society. Some exhibits are created with 
molecular graphics programs alone, while others are collages, or involve physical materials. 
An example from Mike Hann (1994), inspired by Magritte's painting Ceci n'est pas une 
pipe, uses an image of a salmeterol molecule. 

"Ceci n'est pas une molecule," writes Mike Hann, "serves to remind us that all of the 
graphics images presented here are not molecules, not even pictures of molecules, but 
pictures of icons which we believe represent some aspects of the molecule's properties." 
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Space-filling models 

Fig. 4 is a "space-filling" representation of formic acid, 

where atoms are drawn to suggest the amount of space 

they occupy. This is necessarily an icon: in the quantum 

mechanical representation of molecules, there are only 

(positively charged) nuclei and a "cloud" of negative 

electrons. The electron cloud defines an approximate 

size for the molecule, though there can be no single 

precise definition of size. For many years the size of 

atoms has been approximated by mechanical models 

(CPK), where the atoms have been represented by 

plastic spheres whose radius (van der Waals radius) 

describes a sphere within which "most" of the electron 

density can be found. These spheres could be clicked 

together to show the steric aspects of the molecule 

rather than the positions of the nuclei. Fig. 4 shows the 

intricacy required to make sure that all spheres intersect correctly, and also demonstrates 

a reflective model. 




Fig. 4. Space-filling model of formic 

acid. Key: Hydrogen = white, carbon = 

black, oxygen = red. 
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Fig. 5. A molecule (zirconocene) where 


part (left) is rendered as ball-and-stick 


and part Q 


-ight) as an isosurface. 



Since the atomic radii (e.g. in Fig. 4) are only slightly 
less than the distance between bonded atoms, the 
iconic spheres intersect, and in the CPK models, this 
was achieved by planar truncations along the bonding 
directions, the section being circular. When raster 
graphics became affordable, one of the common 
approaches was to replicate CPK models in silico. It is 
relatively straightforward to calculate the circles of 
intersection, but more complex to represent a model 
with hidden surface removal. A useful side product is 
that a conventional value for the molecular volume can 
be calculated. 



The use of spheres is often for convenience, being 
limited both by graphics libraries and the additional effort required to compute complete 
electronic density or other space-filling quantities. It is now relatively common to see 
images of isosurfaces that have been coloured to show quantities such as electrostatic 
potential. The commonest isosurfaces are the Connolly surface, or the volume within which 
a given proportion of the electron density lies. The isosurface in Fig. 5 appears to show the 
electrostatic potential, with blue colours being negative and red/yellow (near the metal) 
positive. (There is no absolute convention of colouring, and red/positive, blue/negative are 
often confusingly reversed!) Opaque isosurfaces do not allow the atoms to be seen and 
identified and it is not easy to deduce them. Because of this, isosurfaces are often drawn 
with a degree of transparency. 
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Technology 

Molecular graphics has always pushed the limits of display technology, and has seen a 
number of cycles of integration and separation of compute-host and display. Early systems 
like Project MAC were bespoke and unique, but in the 1970s the MMS-X and similar 
systems used (relatively) low-cost terminals, such as the Tektronix 4014 series, often over 
dial-up lines to multi-user hosts. The devices could only display static pictures but, were 
able to evangelize MG. In the late 1970s, it was possible for departments (such as 
crystallography) to afford their own hosts (e.g., PDP-11) and to attach a display (such as 
Evans & Sutherland's MPS) directly to the bus. The display list was kept on the host, and 
interactivity was good since updates were rapidly reflected in the display— at the cost of 
reducing most machines to a single-user system. 

In the early 1980s, Evans & Sutherland (E&S) decoupled their PS300 display, which 
contained its own display information transformable through a dataflow architecture. 
Complex graphical objects could be downloaded over a serial line (e.g. 9600 baud) and then 
manipulated without impact on the host. The architecture was excellent for high 
performance display but very inconvenient for domain-specific calculations, such as 
electron-density fitting and energy calculations. Many crystallographers and modellers 
spent arduous months trying to fit such activities into this architecture. 

The benefits for MG were considerable, but by the later 1980s, UNIX workstations such as 
Sun-3 with raster graphics (initially at a resolution of 256 by 256) had started to appear. 
Computer-assisted drug design in particular required raster graphics for the display of 
computed properties such as atomic charge and electrostatic potential. Although E&S had a 
high-end range of raster graphics (primarily aimed at the aerospace industry) they failed to 
respond to the low-end market challenge where single users, rather than engineering 
departments, bought workstations. As a result the market for MG displays passed to Silicon 
Graphics, coupled with the development of minisupercomputers (e.g., CONVEX and Alliant) 
which were affordable for well-supported MG laboratories. Silicon Graphics provided a 
graphics language, IrisGL, which was easier to use and more productive than the PS300 
architecture. Commercial companies (e.g., Biosym, Polygen/MSI) ported their code to 
Silicon Graphics, and by the early 1990s, this was the "industry standard". 

Stereoscopic displays were developed based on liquid crystal polarized spectacles, and 
while this had been very expensive on the PS300, it now became a commodity item. A 
common alternative was to add a polarizable screen to the front of the display and to 
provide viewers with extremely cheap spectacles with orthogonal polarization for separate 
eyes. With projectors such as Barco, it was possible to project stereoscopic display onto 
special silvered screens and supply an audience of hundreds with spectacles. In this way 
molecular graphics became universally known within large sectors of chemical and 
biochemical science, especially in the pharmaceutical industry. Because the backgrounds of 
many displays were black by default, it was common for modelling sessions and lectures to 
be held with almost all lighting turned off. 

In the last decade almost all of this technology has become commoditized. IrisGL evolved to 
OpenGL so that molecular graphics can be run on any machine. In 1992, Roger Sayle 
released his RasMol program into the public domain. RasMol contained a very 
high-performance molecular renderer that ran on Unix/X Window, and Sayle later ported 
this to the Windows and Macintosh platforms. The Richardsons developed kinemages and 
the Mage software, which was also multi-platform. By specifying the chemical MIME type, 
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molecular models could be served over the Internet, so that for the first time MG could be 
distributed at zero cost regardless of platform. In 1995, Birkbeck College's crystallography 
department used this to run "Principles of Protein Structure", the first multimedia course 
on the Internet, which reached 100 to 200 scientists. 





Fig. 6. A molecule of Porin (protein) shown without ambient occlusion (left) and with (right). Advanced rendering 
effects can improve the comprehension of the 3D shape of a molecule. 

MG continues to see innovation that balances technology and art, and currently zero-cost or 
open source programs such as PyMOL and Jmol have very wide use and acceptance. 

Recently the wide spread diffusion of advanced graphics hardware, has improved the 
rendering capabilities of the visualization tools. The capabilities of current shading 
languages allow the inclusion of advanced graphic effects (like ambient occlusion, cast 
shadows and non-photorealistic rendering techniques) in the interactive visualization of 
molecules. These graphic effects, beside being eye candy, can improve the comprehension 
of the three dimensional shapes of the molecules. An example of the effects that can be 
achieved exploiting recent graphics hardware can be seen in the simple open source 
visualization system QuteMol. 



Algorithms 



Reference frames 

Drawing molecules requires a transformation between molecular coordinates (usually, but 
not always, in Angstrom units) and the screen. Because many molecules are chiral it is 
essential that the handedness of the system (almost always right-handed) is preserved. In 
molecular graphics the origin (0, 0) is usually at the lower left, while in many computer 
systems the origin is at top left. If the z-coordinate is out of the screen (towards the viewer) 
the molecule will be referred to right-handed axes, while the screen display will be 
left-handed. 

Molecular transformations normally require: 

• scaling of the display (but not the molecule). 

• translations of the molecule and objects on the screen. 

• rotations about points and lines. 

Conformational changes (e.g. rotations about bonds) require rotation of one part of the 
molecule relative to another. The programmer must decide whether a transformation on the 
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screen reflects a change of view or a change in the molecule or its reference frame. 



Simple 




Fig. 7. Stick model of caffeine drawn in 
Jmol. 



In early displays only vectors could be drawn e.g. (Fig. 
7) which are easy to draw because no rendering or 
hidden surface removal is required. 

On vector machines the lines would be smooth but on 
raster devices Bresenham's algorithm is used (note the 
"jaggies" on some of the bonds, which can be largely 
removed with antialiasing software.) 

Atoms can be drawn as circles, but these should be 
sorted so that those with the largest z-coordinates 
(nearest the screen) are drawn last. Although 
imperfect, this often gives a reasonably attractive 
display. Other simple tricks which do not include 
hidden surface algorithms are: 



• colouring each end of a bond with the same colour as the atom to which it is attached 
(Fig. 7). 

• drawing less than the whole length of the bond (e.g. 10%-90%) to simulate the bond 
sticking out of a circle. 

• adding a small offset white circle within the circle for an atom to simulate reflection. 

Typical pseudocode for creating Fig. 7 (to fit the molecule exactly to the screen): 




Molecular graphics 



29 



Note that this assumes the origin is in the bottom left corner of the screen, with Y up the 
screen. Many graphics systems have the origin at the top left, with Y down the screen. In 
this case the lines (1) and (2) should have the y coordinate generation as: 

yO = yScreenMax - (yOf f set+atom0.getY( )*scale) // (1) 
yl = yScreenMax - (yOf f set+atoml. getY( )*scale) // (2) 

Changes of this sort change the handedness of the axes so it is easy to reverse the chirality 
of the displayed molecule unless care is taken. 

Advanced 

For greater realism and better comprehension of the 3D structure of a molecule many 
computer graphics algorithms can be used. For many years molecular graphics has 
stressed the capabilities of graphics hardware and has required hardware-specific 
approaches. With the increasing power of machines on the desktop, portability is more 
important and programs such as Jmol have advanced algorithms that do not rely on 
hardware. On the other hand recent graphics hardware is able to interactively render very 
complex molecule shapes with a quality that would not be possible with standard software 
techniques. 



Chronology 



This table provides an incomplete chronology of molecular graphics advances. 


Developer(s) 


Approximate 
date 


Technology 


Comments 


Crystallographers 


< 1960 


Hand-drawn 


Crystal structures, with hidden atom 
and bond removal. Often clinographic 
projections. 


Cyrus Levinthal, Bob 
Langridge 


1960s 


CRT 


First protein display on screen (Project 
MAC). 


Johnson, Motherwell 


ca 1970 


Pen plotter 


ORTEP, PLUTO. Very widely deployed 
for publishing crystal structures. 


Langridge, White, 
Marshall 


Late 1970s 


Departmental systems 
(PDP-11, Tektronix 
displays or DEC-VT11, e.g. 
MMS-X) 


Mixture of commodity computing with 
early displays. 


T. Alwyn Jones 


1978 


FRODO 


Crystallographic structure solution. 


Davies, Hubbard 


Mid-1980s 


CHEM-X, HYDRA 


Laboratory systems with multicolor, 
raster and vector devices (Sigmex, 
PS300). 


Biosym, Tripos, Polygen 


Mid-1980s 


PS300 and lower cost 
dumb terminals (VT200, 
SIGMEX) 


Commercial integrated modelling and 
display packages. 


Silicon Graphics, Sun 


Late 1980s 


IRIS GL (UNIX) 
workstations 


Commodity-priced single-user 
workstations with stereoscopic 
display. 


EMBL - WHAT IF [4] 


1989, 2000 


Machine independent 


Nearly free, multifunctional, still fully 
supported, many free servers 
based on it 
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Sayle, Richardson 


1992, 1993 


RasMol, Kinemage 


Platform-independent MG. 


MDL (van Vliet, Maffett, 
Adler, Holt) 


1995-1998 


Chime 


proprietary C++ ; free browser plugin 
for Mac (OS9) and PCs 


ChemAxon 


1998- 


MarvinSketch [6] & 

[7] 
MarvinView 

MarvinSpace [8] (2005) 


proprietary Java applet or stand-alone 
application. 


Community efforts 


2000- 


Jmol, PyMol, Protein 
Workshop (www.pdb.org) 


Open-source Java applet or 
stand-alone application. 


San Diego Supercomputer 
Center 


2006- 


Sirius 


Free for academic/non-profit 
institutions 


NOCH 


2002- 


NOC [9] 


Powerful and open source code 
molecular structure explorer 


Weizmann Institute of 
Science - Community 
efforts 


2008- 


Proteopedia 


Collaborative, 3D wiki encyclopedia of 
proteins & other molecules 
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External links 



• The PyMOL Molecular Graphics System (http://pymol.sf.net) -- open source 

• PyMOLWiki (http://pymolwiki.org) -- community supported wiki for PyMOL 

• History of Visualization of Biological Macromolecules (http://www.umass.edu/ 
microbio/rasmol/history.htm) by Eric Martz and Eric Francoeur. 

• Brief History of Molecular Mechanics/Graphics (http://stanley.chem.lsu.edu/webpub/ 
7770-Lecture-l-intro.pdf) in LSU CHEM7770 lecture notes. 

• Historical slides (http://luminary.stanford.edu/langridge/slides.htm) from Robert 
(Bob) Langridge. These show the influence of Crick and Watson on molecular graphics 
(including Levinthal's) and the development of early display technology, finishing with 
displays which were common in the mid-1980s on machines such as Evans and 
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Sutherland's PS300 series. 

Interview with Langridge. (http://luminary.stanford.edu/langridge/langridge.html) 

The display looking down the axis of B-DNA has been likened to a rose window. 

Nelson Max's home page (http://accad.osu.edu/~waynec/history/tree/max.html) 

with links to 1982 classics. 

Jmol home page (http://jmol.sourceforge.net/) contains an applet with an automatic 

display of many features of molecular graphics including metaphors, scripting, 

annotation and animation. 

Richardson Lab (http://kinemage.biochem.duke.edu/) includes Kinemage and 

molecular graphics images. 

History of RasMol. (http://www.openrasmol.org/history.html) 

Molecule of the Month (http://www.rcsb. org/pdb/static.do?p=education_discussion/ 

molecule j>f_the_month/index. html) at RCSB/PDB. 

xeo (http://sourceforge.net/projects/xeo) xeo is a free (GPL) open project management 

for nanostructures using Java 

Exhibitions of Molecular Graphics Art (http://www.scripps.edu/mb/goodsell/mgs_art/ 

), 1994, 1998. 

NOCH home page (http://noch.sourceforge.net) A powerful, efficient and open source 

molecular graphics tool. 

eMovie (http://www.weizmann.ac.il/ISPC/eMovie.html): a tool for creation of 

molecular animations with PyMOL. 

Proteopedia (http://www.proteopedia.org): The collaborative, 3D encyclopedia of 

proteins and other molecules. 

Ascalaph Graphics (http://www.agilemolecule.com/Ascalaph/Ascalaph_Graphics. 

html): a molecular viewer with some geometry editing capabilities. 

Molecular Graphics and Modelling Society, (http://www.mgms.org/) 

Journal of Molecular Graphics and Modelling (http://www.sciencedirect.com/ 

science?_ob=JournalURL&_cdi=5260&_auth=y&_acct=C000053194&_version=l& 

_urlVersion=0&_userid=1495569&md5 = le86bcce088e98890cea52f6eda84b64) 

(formally Journal of Molecular Graphics). This journal is not open access. 
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List of software for molecular 
mechanics modeling 

This is a list of computer programs that are predominantly used for molecular mechanics 
calculations. 

Min - Optimization, MD - Molecular Dynamics, MC - Monte Carlo, QM - Quantum 

mechanics. Imp - Implicit water. HA - Hardware accelerated. 

Y - Yes. 

I - Has interface. 



Name 


View 
3D 


Model 
Builder 


Min 


MD 


MC 


QM 


Imp 


HA 


Comments 


License 


Website 


Ab alone 


Y 


Y 


Y 


Y 






Y 




Biomolecular 
simulations, protein 
folding. 


Not 
free 


Agile 

Molecule 
[1] 


ACEMD [ ' 


I] 




Y 


Y 








Y 


Molecular 
dynamics with 
CHARMM, Amber 
forcefields. 
Running on NVIDIA 
GPUs. Heavily 
optimized with 
CUDA. 


Not 
free 


Acellera Ltd 
[3] 


AMBER [ ' 


1] 


Y 


Y 


Y 






Y 






Not 
free 


ambermd.orc 
[5] 


Ascalaph 
Designer 


Y 


Y 


Y 


Y 




I 




Y 


Molecular building 

(DNA, proteins, 

hydrocarbons, 

nanotubes). 

Molecular 

dynamics. GPU 

acceleration. 


Free 

& 

Comme 


Ascalaph 

Project [6] 
rcial 


Balloon 




Y 


Y 












2D/3D conversion 
and conformational 
analysis. 


Free 

to 

use, 

closed 

source 


Abo 

T71 
Akademi 


BOSS 






Y 




Y 


Y 






OPLS 


Comme 


University 
[8] 


CHARM IN 


4 


Y 


Y 


Y 


Y 


I 






Commercial version 
with multiple 
graphical front 
ends is sold by 
Accelrys (as 
CHARMm) 


Not 
free 


charmm.org 
[9] 
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ChemSketth Y 



Fast 2-D graphical 
molecule builder 
and 3-D viewer. 
Contains simplified 
CHARMM for fast 
stable inaccurate 
optimization of 
single molecules up 
to 1000 atoms 



Advanced 
Chemistry 
Development 



Inc. 



[1 



of 



COSMOS Y 



Desmond 



Hybrid QM/MM 
COSMOS-NMR 
force field with fast 
semi-empirical 
calculation of 
electrostatic and/or 
NMR properties. 
3-D graphical 
molecule builder 
and viewer. 

High Performance 
MD. 



Free 



( without 
GUI) 
and 
commercial 



COSMOS 

oftware 
[11] 



Free 
and 



comme rfclafl 



D. E. Shaw 
Research 



GoVASP 



I I 



GROMACS 



GROMOS 



GoVASP is a 
sophisticated 
graphical user 
interface for the 
Vienna Ab-initio 
Simulation Package 
(VASP). GoVASP 
comprises tools to 
prepare, perform 
and monitor VASP 
calculations and to 
evaluate and 
visualize the 
computed data. 

High performance 
MD 

Geared towards 
biomolecules 



Closed 
source. 



Windiks 

/t C°o t nsulting 
free/Tria~]13] 

available 



Free 



Not 
free 



gromacs.org 
fl4] 



LAMMPS 



MacroMoYel Y 



Has potentials for Free 
soft and solid-state 
materials and 
coarse-grain 
systems 

OPLS-AA, GBSA Not 

solvent model, free 

conformational 
sampling, 
minimization, MD 



Sandia 



[15] 



Schrodinger, 
LLC [16] 
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Materials 
Studio 



Materials Studio is 
a software 
environment that 
brings the 
materials 
simulation 
technology to 
desktop computing, 
solving key 
problems 
throughout the 
R&D process. 



Closed 
source, 
available 



Accelrys 

ma 



MedeA 



MCCCS 
Towhee 



MedeA combines 
leading 
experimental 
databases and 
major 

computational 
programs like the 
Vienna Ab-initio 
Simulation Package 
(VASP) with 
sophisticated 
materials property 
prediction, 
analysis, and 
visualization. 

Originally designed 
for the prediction 
of fluid phase 
eguilibria 



link 



Closed 

source/[Not 

free 



[18] 



Free 



Towhee 
Project 



[19] 



MDynaMix 



[20] 



MOE 



MOIL 



Y Y Y Y 



Y Y Y Y 



Parallel MD 



Molecular 
Operating 
Environment 

Also includes 
action-based 
algorithms 
(Stochastic 
Difference 
Eguation in Time 
and Stochastic 
Difference 
Eguation in Length) 
and locally 
enhanced sampling. 



Free 



Stockholm 

University 
[21] 



Commep 



Comical 

itini 
[221 



Computing 
■J 



Free 



Group 
hnk^l 



molecoolsY Y 



MOLDY 



Simple Javascript 
molecular 
visualization tool 

Parallel, only 
pair-potentials, Cell 
lists, modified 
Beeman's algorithm 



link 



[24] 



Free 



Moldy 



[25] 
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NAB [26] 




Y 














Generation of 
Models for 
"Unusual" DNA and 
RNA 


Free 


Case group 
[27] 


Packmol 




Y 














Builds complex 
initial 

configurations for 
Molecular 
Dynamics 




link [28] 


Prime 


Y 


Y 


Y 




Y 


I 


Y 




Homology 
modeling, loop and 
side chain 
optimization, 
minimization, 
OPLS-AA, SGB 
solvent model, 
parallalized 




link [29] 


Protein 
Local 
Optimiza 
Program 


tion 


Y 


Y 


Y 


Y 








Helix, loop, and 
side chain 
optimization. Fast 
energy 
minimization. 


Not 
free 


link [30] 


QMOL 


Y 
















Protein viewer 


Free 


DNASTAR, 

, [31] 
Inc. 


RasMol 


Y 
















Fast viewer 


Free 


RasMol [32] 


Raster 3D 


Y 
















High quality raster 
images 


Free 


University 

of 

Washinqton 
[33] 


STR3DI3 


2V 


Y 


Y 


Y 










Sophisticated 3-D 
molecule builder 
and viewer, 
advanced 
structural 
analytical 
algorithms, full 
featured molecular 
modeling and 
quantitation of 
stereo-electronic 
effects, docking 
and the handling of 
complexes. 


The 

200 
atom 
version 
is free 


Exorqa, Inc. 
[34] 


Selvita 
Protein 
Modeling 
Platform 


Y 


Y 


Y 




Y 








Protein structure 
prediction, 
homology 
modeling, ab initio 
modeling, loop 
modeling, protein 
threading 


Comme 


r §Svita Ltd 
[35] 


TINKER 


I 


Y 


Y 


Y 


Y 


I 


Y 




Software Tools for 
Molecular Design 


Free 


Washinqton 

University 
[36] 
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UCSF 
Chimera 


Y 


Y 


Y 












Visually appealing 
viewer, amino acid 
rotamers and other 
building, includes 
Antechamber and 
MMTK, Ambertools 
plugins in 
development. 




University 

of California 
[37] 


VMD + 
NAMD 


Y 


Y 


Y 


Y 








? 


Fast, parallel MD 


Free 


Beckman 

Institute 
[38] 


WHAT 
IF 


Y 


Y 


I 


I 


I 








Visualizer for MD. 
Interface to 
GROMACS. 


Not 
free 


WHAT IF 
[4] 


xeo 


Y 


Y 














open project 
management for 
nanostructures 




link [39] 


YASARA 


Y 


Y 




Y 




Y 






Molecular-graphics, 
-modeling and 
-simulation 
program 


Not 
free 


YASARA.org 
[40] 


Zodiac 


Y 


Y 


Y 












Drug design suite 




lmk [41] 



See also 

Molecular dynamics 

Molecular Design software 

Molecule editor 

Molecular modeling on GPU 

Quantum chemistry computer programs 

List of nucleic acid simulation software 

List of protein structure prediction software 

Force field implementation 



External links 

SINCRIS [42] 

Linux4Chemistry 

Collaborative Computational Project 

World Index of Molecular Visualization Resources 



[43] 



[44] 



[46] 



Short list of Molecular Modeling resources 
OpenScience [ 7] 

Biological Magnetic Resonance Data Bank ^ ' 
Materials modelling and computer simulation codes ^ * 
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Molecular Modeling Applications 
to Complex Biomolecules 

Protein structure prediction 

Protein structure prediction is the prediction of the three-dimensional structure of a 
protein from its amino acid sequence— that is, the prediction of a protein's tertiary 
structure from its primary structure. It is one of the most important goals pursued by 
bioinformatics and theoretical chemistry. Protein structure prediction is of high importance 
in medicine (for example, in drug design) and biotechnology (for example, in the design of 
novel enzymes). Every two years, the performance of current methods is assessed in the 
CASP experiment. 

The practical role of protein structure prediction is now more important than ever. Massive 
amounts of protein sequence data are produced by modern large-scale DNA sequencing 
efforts such as the Human Genome Project. Despite community-wide efforts in structural 
genomics, the output of experimentally determined protein structures— typically by 
time-consuming and relatively expensive X-ray crystallography or NMR spectroscopy— is 
lagging far behind the output of protein sequences. 

A number of factors exist that make protein structure prediction a very difficult task. The 
two main problems are that the number of possible protein structures is extremely large, 
and that the physical basis of protein structural stability is not fully understood. As a result, 
any protein structure prediction method needs a way to explore the space of possible 
structures efficiently (a search strategy), and a way to identify the most plausible structure 
(an energy function). 

In comparative structure prediction (also called homology modeling), the search space is 
pruned by the assumption that the protein in question adopts a structure that is reasonably 
close to the structure of at least one known protein. In de novo or ab initio structure 
prediction, no such assumption is made, which results in a much harder search problem. In 
both cases, an energy function is needed to recognize the native structure, and to guide the 
search for the native structure. Unfortunately, the construction of such an energy function 
is to a great extent an open problem. 

Direct simulation of protein folding in atomic detail, via methods such as molecular 
dynamics with a suitable energy function, is typically not tractable due to the high 
computational cost, despite the efforts of distributed computing projects such as 
Folding@home. Therefore, most de novo structure prediction methods rely on simplified 
representations of the atomic structure of proteins. 

The above mentioned issues apply to all proteins, including well-behaving, small, 
monomeric proteins. In addition, for specific proteins (such as for example multimeric 
proteins and disordered proteins), the following issues also arise: 

• Some proteins require stabilisation by additional domains or binding partners to adopt 
their native structure. This requirement is typically unknown in advance and difficult to 
handle by a prediction method. 



Protein structure prediction 39 

• The tertiary structure of a native protein may not be readily formed without the aid of 
additional agents. For example, proteins known as chaperones are required for some 
proteins to properly fold. Other proteins cannot fold properly without modifications such 
as glycosylation. 

• A particular protein may be able to assume multiple conformations depending on its 
chemical environment. 

• The biologically active conformation may not be the most thermodynamically favorable. 

Due to the increase in computer power, and especially new algorithms, much progress is 
being made to overcome these problems. However, routine de novo prediction of protein 
structures, even for small proteins, is still not achieved. 

Ab initio protein modelling 

Ab initio- or de novo- protein modelling methods seek to build three-dimensional protein 
models "from scratch", i.e., based on physical principles rather than (directly) on previously 
solved structures. There are many possible procedures that either attempt to mimic protein 
folding or apply some stochastic method to search possible solutions (i.e., global 
optimization of a suitable energy function). These procedures tend to require vast 
computational resources, and have thus only been carried out for tiny proteins. To predict 
protein structure de novo for larger proteins will require better algorithms and larger 
computational resources like those afforded by either powerful supercomputers (such as 
Blue Gene or MDGRAPE-3) or distributed computing (such as Folding@home, the Human 
Proteome Folding Project and Rosetta@Home). Although these computational barriers are 
vast, the potential benefits of structural genomics (by predicted or experimental methods) 
make ab initio structure prediction an active research field 

As an intermediate step towards predicted protein structures, contact map predictions have 
been proposed. 

Comparative protein modelling 

Comparative protein modelling uses previously solved structures as starting points, or 
templates. This is effective because it appears that although the number of actual proteins 
is vast, there is a limited set of tertiary structural motifs to which most proteins belong. It 
has been suggested that there are only around 2000 distinct protein folds in nature, though 
there are many millions of different proteins. 

These methods may also be split into two groups : 

• Homology modeling is based on the reasonable assumption that two homologous 
proteins will share very similar structures. Because a protein's fold is more evolutionarily 
conserved than its amino acid sequence, a target sequence can be modeled with 
reasonable accuracy on a very distantly related template, provided that the relationship 
between target and template can be discerned through sequence alignment. It has been 
suggested that the primary bottleneck in comparative modelling arises from difficulties in 
alignment rather than from errors in structure prediction given a known-good 
alignment. Unsurprisingly, homology modelling is most accurate when the target and 



template have similar sequences. 

Protein threading scans the ai 

database of solved structures. In each case, a scoring function is used to assess the 



• Protein threading scans the amino acid sequence of an unknown structure against a 
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compatibility of the sequence to the structure, thus yielding possible three-dimensional 
models. This type of method is also known as 3D-1D fold recognition due to its 
compatibility analysis between three-dimensional structures and linear protein 
sequences. This method has also given rise to methods performing an inverse folding 
search by evaluating the compatibility of a given structure with a large database of 
sequences, thus predicting which sequences have the potential to produce a given fold. 

Side chain geometry prediction 

Even structure prediction methods that are reasonably accurate for the peptide backbone 
often get the orientation and packing of the amino acid side chains wrong. Methods that 
specifically address the problem of predicting side chain geometry include dead-end 
elimination and the self-consistent mean field method. Both discretize the continuously 
varying dihedral angles that determine a side chain's orientation relative to the backbone 
into a set of rotamers with fixed dihedral angles. The methods then attempt to identify the 
set of rotamers that minimize the model's overall energy. Rotamers are the side chain 
conformations with low energy. Such methods are most useful for analyzing the protein's 
hydrophobic core, where side chains are more closely packed; they have more difficulty 
addressing the looser constraints and higher flexibility of surface residues. 

Protein-protein complexes 

In the case of complexes of two or more proteins, where the structures of the proteins are 
known or can be predicted with high accuracy, protein-protein docking methods can be 
used to predict the structure of the complex. Information of the effect of mutations at 
specific sites on the affinity of the complex helps to understand the complex structure and 
to guide docking methods. 

Software 

MODELLER is a popular software tool for producing homology models using methodology 
derived from NMR spectroscopy data processing. SwissModel provides an automated 
web server for basic homology modeling. I-TASSER is the best server for protein 
structure prediction according to the recent CASP experiments (CASP7 [ ] and CASP8 

). Common software tools for protein threading are HHpred / HHsearch, bioinfo.pl 
Robetta , and Phyre . RAPTOR (software) is a protein threading software that is 
based on integer programming. The basic algorithm for threading is described in and is 
fairly straightforward to implement. Abalone is a Molecular Dynamics program for 
folding simulations with explicit or implicit water models. 

Several distributed computing projects concerning protein structure prediction have also 
been implemented, such as the Folding@home, Rosetta@home, Human Proteome Folding 
Project, Predictor@home and TANPAKU. The Foldit program seeks to investigate the 
pattern-recognition and puzzle-solving abilities inherent to the human mind in order to 
create more successful computer protein structure prediction software. 

Computational approaches provide a fast alternative route to antibody structure prediction. 
Recently developed antibody F region high resolution structure prediction algorithms like 
RosettaAntibody ( http://antibody.graylab.jhu.edu ) have been shown to generate high 
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resolution homology models which have been used for successful docking. J 
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Reviews of software for structure prediction can be found at. The progress and 

rn 
challenges in protein structure prediction has been reviewed in . 

Automatic structure prediction servers 

CASP, which stands for Critical Assessment of Techniques for Protein Structure Prediction, 
is a community-wide experiment for protein structure prediction taking place every two 
years since 1994. CASP provides users and research groups with an opportunity to assess 
the quality of available methods and automatic servers for protein structure prediction. 
Official results for automatic structure prediction servers in the CASP7 benchmark (2006) 
are discussed by Battey at al.: . Official CASP8 results are available here 
Preliminary, unofficial results for automatic servers of the recent CASP8 benchmark are 
summarized on several lab websites and ranked according to slightly varying criteria: 
Zhang lab [17] , Grishin lab [18] , McGuffin lab [19] , Baker lab [20] , Cheng lab [21] 

See also 

• Protein design 

• Protein structure prediction software 

• Protein-protein interaction prediction 

• Molecular modeling software 

• CASP: Annual Protein Structure Prediction Competition 
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Protein design 



Protein design is the design of new protein molecules from scratch, or the deliberate 
design of a new molecule by making calculated variations on a known structure. The 
number of possible amino acid sequences is enormous, but only a subset of these sequences 
will fold reliably and quickly to a single native state. Protein design involves identifying 
such sequences, in particular those with a physiologically active native state. Protein design 
is a rational design technique used in protein engineering. 

Protein design requires an understanding of the molecular interactions that stabilize 
proteins in specific folded configurations fold; experience has shown, however, that protein 
design does not require an understanding of the dynamical process by which proteins fold. 
In a sense it is the reverse of structure prediction: a tertiary structure is specified, and an 
amino acid sequence is identified which will fold to it. 

Protein design is also referred to as inverse folding. From a physical point of view, the 
native state conformation of a protein is the free energy minimum for the protein chain. 
Hence, designing a new protein involves the identification of the sequences which have the 
chosen structure as free energy minimum. This can be done by use of computer models, 
which, while simplifying the problem, are able to generate sequences to fold on the desired 
structure. 

The design of minimalist computer models of proteins (lattice proteins), and the secondary 
structural modification of real proteins, began in the mid-1990s. The de novo design of real 
proteins became possible shortly afterwards, and the 21st century has seen the creation of 
small proteins with real biological function including catalysis and antiviral behaviour. 
There is great hope that the design of these and larger proteins will have application in 
medicine and bioengineering. 

Computational protein design algorithms seek to identify amino acid sequences that have 
low energies for target structures. While the sequence-conformation space that needs to be 
searched is large, the most challenging requirement for computational protein design is a 
fast, yet accurate, energy function that can distinguish optimal sequences from similar 
suboptimal ones. Using computational methods, a protein with a novel fold has been 
designed[l], as well as sensors for un-natural molecules[2]. 
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On the other hand, it is widely believed that not all possible protein structures are 
designable, which means that there are compact configurations of the chain which no 
sequences can fold to. In particular, conformations which are poor in secondary structures 
are unlikely to be designable. The designability of given structures is still an issue that is 
poorly understood. 

Models of protein structure and function used in protein 
design 

Computational protein design algorithms use models of 

protein energetics to evaluate how mutations would 

affect a protein's structure and function. These energy 

functions typically include a combination of molecular 

mechanics, knowledge-based, and other empirical 

terms. However, the trend has been towards using Comparison of various potential 

r-j-i energy functions 
more physically based potential energy functions. 1 J | 



Software 

EGAD: A Genetic Algorithm for protein Design[4]. A free, open-source software 
package for protein design and prediction of mutation effects on protein folding stabilities 
and binding affinities. EGAD can also consider multiple structures simultaneously for 
designing specific binding proteins or locking proteins into specific conformational states. 
In addition to natural protein residues, EGAD can also consider free-moving ligands with or 
without rotatable bonds. EGAD can be used with single or multiple processors. 

r cr "I 

SHARPEN . A permissive open-source library for protein design and structure 
prediction. SHARPEN offers a variety of combinatorial optimization methods (e.g. Monte 
Carlo, Simulated Annealing, FASTER ) and can score proteins using the successful 
Rosetta all-atom force field or molecular mechanics force fields (OPLSaa). In addition to the 
protein modeling library, SHARPEN includes tools for scalable distributed computing. 

WHAT IF software for protein modelling, design, validation, and visualisation. 

Abalone software for protein modelling and visualisation. 
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Homology modeling 



Homology modeling, also known as comparative modeling of protein refers to 
constructing an atomic-resolution model of the "target" protein from its amino acid 
sequence and an experimental three-dimensional structure of a related homologous protein 
(the "template"). Homology modeling relies on the identification of one or more known 
protein structures likely to resemble the structure of the query sequence, and on the 
production of an alignment that maps residues in the query sequence to residues in the 
template sequence. The sequence alignment and template structure are then used to 
produce a structural model of the target. Because protein structures are more conserved 
than DNA sequences, detectable levels of sequence similarity usually imply significant 
structural similarity. 

The quality of the homology model is dependent on the quality of the sequence alignment 
and template structure. The approach can be complicated by the presence of alignment 
gaps (commonly called indels) that indicate a structural region present in the target but not 
in the template, and by structure gaps in the template that arise from poor resolution in the 
experimental procedure (usually X-ray crystallography) used to solve the structure. Model 
quality declines with decreasing sequence identity; a typical model has —1-2 A root mean 
square deviation between the matched C a atoms at 70% sequence identity but only 2-4 A 
agreement at 25% sequence identity. However, the errors are significantly higher in the 
loop regions, where the amino acid sequences of the target and template proteins may be 
completely different. 

Regions of the model that were constructed without a template, usually by loop modeling, 
are generally much less accurate than the rest of the model. Errors in side chain packing 
and position also increase with decreasing identity, and variations in these packing 
configurations have been suggested as a major reason for poor model quality at low 
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identity. Taken together, these various atomic-position errors are significant and impede 
the use of homology models for purposes that require atomic-resolution data, such as drug 
design and protein-protein interaction predictions; even the quaternary structure of a 
protein may be difficult to predict from homology models of its subunit(s). Nevertheless, 
homology models can be useful in reaching qualitative conclusions about the biochemistry 
of the query sequence, especially in formulating hypotheses about why certain residues are 
conserved, which may in turn lead to experiments to test those hypotheses. For example, 
the spatial arrangement of conserved residues may suggest whether a particular residue is 
conserved to stabilize the folding, to participate in binding some small molecule, or to 
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foster association with another protein or nucleic acid. 

Homology modeling can produce high-quality structural models when the target and 
template are closely related, which has inspired the formation of a structural genomics 
consortium dedicated to the production of representative experimental structures for all 
classes of protein folds. The chief inaccuracies in homology modeling, which worsen with 
lower sequence identity, derive from errors in the initial sequence alignment and from 
improper template selection. Like other methods of structure prediction, current practice 
in homology modeling is assessed in a biannual large-scale experiment known as the 
Critical Assessment of Techniques for Protein Structure Prediction, or CASP. 

Motivation 

The method of homology modeling is based on the observation that protein tertiary 
structure is better conserved than amino acid sequence. Thus, even proteins that have 
diverged appreciably in sequence but still share detectable similarity will also share 
common structural properties, particularly the overall fold. Because it is difficult and 
time-consuming to obtain experimental structures from methods such as X-ray 
crystallography and protein NMR for every protein of interest, homology modeling can 
provide useful structural models for generating hypotheses about a protein's function and 
directing further experimental work. 

There are exceptions to the general rule that proteins sharing significant sequence identity 
will share a fold. For example, a judiciously chosen set of mutations of less than 50% of a 
protein can cause the protein to adopt a completely different fold. However, such a 

massive structural rearrangement is unlikely to occur in evolution, especially since the 
protein is usually under the constraint that it must fold properly and carry out its function 
in the cell. Consequently, the roughly folded structure of a protein (its "topology") is 
conserved longer than its amino-acid sequence and much longer than the corresponding 
DNA sequence; in other words, two proteins may share a similar fold even if their 
evolutionary relationship is so distant that it cannot be discerned reliably. For comparison, 
the function of a protein is conserved much less than the protein sequence, since relatively 
few changes in amino-acid sequence are required to take on a related function. 

Steps in model production 

The homology modeling procedure can be broken down into four sequential steps: template 
selection, target-template alignment, model construction, and model assessment. The 
first two steps are often essentially performed together, as the most common methods of 
identifying templates rely on the production of sequence alignments; however, these 
alignments may not be of sufficient quality because database search techniques prioritize 
speed over alignment quality. These processes can be performed iteratively to improve the 
quality of the final model, although quality assessments that are not dependent on the true 
target structure are still under development. 

Optimizing the speed and accuracy of these steps for use in large-scale automated structure 
prediction is a key component of structural genomics initiatives, partly because the 
resulting volume of data will be too large to process manually and partly because the goal 

of structural genomics requires providing models of reasonable quality to researchers who 

rn 
are not themselves structure prediction experts. 



Homology modeling 46 

Template selection and sequence alignment 

The critical first step in homology modeling is the identification of the best template 
structure, if indeed any are available. The simplest method of template identification relies 
on serial pairwise sequence alignments aided by database search techniques such as 
FASTA and BLAST. More sensitive methods based on multiple sequence alignment - of 
which PSI-BLAST is the most common example - iteratively update their position-specific 
scoring matrix to successively identify more distantly related homologs. This family of 
methods has been shown to produce a larger number of potential templates and to identify 
better templates for sequences that have only distant relationships to any solved structure. 
Protein threading, also known as fold recognition or 3D-1D alignment, can also be used as a 
search technique for identifying templates to be used in traditional homology modeling 
methods. When performing a BLAST search, a reliable first approach is to identify hits 
with a sufficiently low E-value, which are considered sufficiently close in evolution to make 
a reliable homology model. Other factors may tip the balance in marginal cases; for 
example, the template may have a function similar to that of the query sequence, or it may 
belong to a homologous operon. However, a template with a poor £ -value should generally 
not be chosen, even if it is the only one available, since it may well have a wrong structure, 
leading to the production of a misguided model. A better approach is to submit the primary 
sequence to fold-recognition servers or, better still, consensus meta-servers which improve 
upon individual fold-recognition servers by identifying similarities (consensus) among 
independent predictions. 

Often several candidate template structures are identified by these approaches. Although 
some methods can generate hybrid models from multiple templates, most methods rely on a 
single template. Therefore, choosing the best template from among the candidates is a key 
step, and can affect the final accuracy of the structure significantly. This choice is guided 
by several factors, such as the similarity of the query and template sequences, of their 
functions, and of the predicted query and observed template secondary structures. Perhaps 
most importantly, the coverage of the aligned regions: the fraction of the query sequence 
structure that can be predicted from the template, and the plausibility of the resulting 
model. Thus, sometimes several homology models are produced for a single query 
sequence, with the most likely candidate chosen only in the final step. 

It is possible to use the sequence alignment generated by the database search technique as 
the basis for the subsequent model production; however, more sophisticated approaches 
have also been explored. One proposal generates an ensemble of stochastically defined 
pairwise alignments between the target sequence and a single identified template as a 
means of exploring "alignment space" in regions of sequence with low local similarity. 
"Profile -profile" alignments that first generate a sequence profile of the target and 
systematically compare it to the sequence profiles of solved structures; the coarse-graining 
inherent in the profile construction is thought to reduce noise introduced by sequence drift 
in nonessential regions of the sequence. 
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Model generation 

Given a template and an alignment, the information contained therein must be used to 
generate a three-dimensional structural model of the target, represented as a set of 
Cartesian coordinates for each atom in the protein. Three major classes of model 
generation methods have been proposed. 

Fragment assembly 

The original method of homology modeling relied on the assembly of a complete model from 
conserved structural fragments identified in closely related solved structures. For example, 
a modeling study of serine proteases in mammals identified a sharp distinction between 
"core" structural regions conserved in all experimental structures in the class, and variable 
regions typically located in the loops where the majority of the sequence differences were 
localized. Thus unsolved proteins could be modeled by first constructing the conserved core 
and then substituting variable regions from other proteins in the set of solved 
structures. Current implementations of this method differ mainly in the way they deal 
with regions that are not conserved or that lack a template. 

Segment matching 

The segment-matching method divides the target into a series of short segments, each of 
which is matched to its own template fitted from the Protein Data Bank. Thus, sequence 
alignment is done over segments rather than over the entire protein. Selection of the 
template for each segment is based on sequence similarity, comparisons of alpha carbon 
coordinates, and predicted steric conflicts arising from the van der Waals radii of the 
divergent atoms between target and template. [12] 

Satisfaction of spatial restraints 

The most common current homology modeling method takes its inspiration from 
calculations required to construct a three-dimensional structure from data generated by 
NMR spectroscopy. One or more target-template alignments are used to construct a set of 
geometrical criteria that are then converted to probability density functions for each 
restraint. Restraints applied to the main protein internal coordinates - protein backbone 
distances and dihedral angles - serve as the basis for a global optimization procedure that 
originally used conjugate gradient energy minimization to iteratively refine the positions of 
all heavy atoms in the protein/ ' 

This method had been dramatically expanded to apply specifically to loop modeling, which 
can be extremely difficult due to the high flexibility of loops in proteins in aqueous 
solution. A more recent expansion applies the spatial-restraint model to electron density 
maps derived from cryoelectron microscopy studies, which provide low-resolution 

information that is not usually itself sufficient to generate atomic-resolution structural 
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models. To address the problem of inaccuracies in initial target-template sequence 

alignment, an iterative procedure has also been introduced to refine the alignment on the 

basis of the initial structural fit. The most commonly used software in spatial 

restraint-based modeling is MODELLER and a database called ModBase has been 

ri7i 
established for reliable models generated with it. 
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Loop modeling 

Regions of the target sequence that are not aligned to a template are modeled by loop 
modeling; they are the most susceptible to major modeling errors and occur with higher 
frequency when the target and template have low sequence identity. The coordinates of 
unmatched sections determined by loop modeling programs are generally much less 
accurate than those obtained from simply copying the coordinates of a known structure, 
particularly if the loop is longer than 10 residues. The first two sidechain dihedral angles 
(X 1 and x 2 ) can usually be estimated within 30° for an accurate backbone structure; 
however, the later dihedral angles found in longer side chains such as lysine and arginine 
are notoriously difficult to predict. Moreover, small errors in % (and, to a lesser extent, in 
X 2 ) can cause relatively large errors in the positions of the atoms at the terminus of side 
chain; such atoms often have a functional importance, particularly when located near the 
active site. 

Model assessment 

Assessment of homology models without reference to the true target structure is usually 
performed with two methods: statistical potentials or physics-based energy calculations. 
Both methods produce an estimate of the energy (or an energy-like analog) for the model or 
models being assessed; independent criteria are needed to determine acceptable cutoffs. 
Neither of the two methods correlates exceptionally well with true structural accuracy, 
especially on protein types underrepresented in the PDB, such as membrane proteins. 

Statistical potentials are empirical methods based on observed residue-residue contact 
frequencies among proteins of known structure in the PDB. They assign a probability or 
energy score to each possible pairwise interaction between amino acids and combine these 
pairwise interaction scores into a single score for the entire model. Some such methods can 
also produce a residue-by-residue assessment that identifies poorly scoring regions within 
the model, though the model may have a reasonable score overall. These methods 
emphasize the hydrophobic core and solvent-exposed polar amino acids often present in 
globular proteins. Examples of popular statistical potentials include Prosa and DOPE. 
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Statistical potentials are more computationally efficient than energy calculations. 

Physics-based energy calculations aim to capture the interatomic interactions that are 
physically responsible for protein stability in solution, especially van der Waals and 
electrostatic interactions. These calculations are performed using a molecular mechanics 
force field; proteins are normally too large even for semi-empirical quantum 
mechanics-based calculations. The use of these methods is based on the energy landscape 
hypothesis of protein folding, which predicts that a protein's native state is also its energy 
minimum. Such methods usually employ implicit solvation, which provides a continuous 
approximation of a solvent bath for a single protein molecule without necessitating the 
explicit representation of individual solvent molecules. A force field specifically constructed 
for model assessment is known as the Effective Force Field (EFF) and is based on atomic 
parameters from CHARMM. [19] 

A very extensive model validation report can be obtained using the Radboud Universiteit 
Nijmegen [ ] "What Check" software which is one option of the Radboud Universiteit 
Nijmegen "What If software package; it produces a many page document with extensive 
analyses of nearly 200 scientific and administrative aspects of the model. "What Check" is 
available as a free server ; it can also be used to validate experimentally determined 
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structures of macromolecules. 

One newer method for model assessment relies on machine learning techniques such as 
neural nets, which may be trained to assess the structure directly or to form a consensus 
among multiple statistical and energy-based methods. Very recent results using support 
vector machine regression on a jury of more traditional assessment methods outperformed 

roil 

common statistical, energy-based, and machine learning methods. 

Structural comparison methods 

The assessment of homology models' accuracy is straightforward when the experimental 
structure is known. The most common method of comparing two protein structures uses the 
root-mean-square deviation (RMSD) metric to measure the mean distance between the 
corresponding atoms in the two structures after they have been superimposed. However, 
RMSD does underestimate the accuracy of models in which the core is essentially correctly 
modeled, but some flexible loop regions are inaccurate. A method introduced for the 
modeling assessment experiment CASP is known as the global distance test (GDT) and 
measures the total number of atoms whose distance from the model to the experimental 
structure lies under a certain distance cutoff. Both methods can be used for any subset 
of atoms in the structure, but are often applied to only the alpha carbon or protein 
backbone atoms to minimize the noise created by poorly modeled side chain rotameric 
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states, which most modeling methods are not optimized to predict. 

Benchmarking 

Several large-scale benchmarking efforts have been made to assess the relative quality of 
various current homology modeling methods. CASP is a community-wide prediction 
experiment that runs every two years during the summer months and challenges prediction 
teams to submit structural models for a number of sequences whose structures have 
recently been solved experimentally but have not yet been published. Its partner CAFASP 
has run in parallel with CASP but evaluates only models produced via fully automated 
servers. Continuously running experiments that do not have prediction 'seasons' focus 
mainly on benchmarking publicly available webservers. LiveBench and EVA run 
continuously to assess participating servers' performance in prediction of imminently 
released structures from the PDB. CASP and CAFASP serve mainly as evaluations of the 
state of the art in modeling, while the continuous assessments seek to evaluate the model 
quality that would be obtained by a non-expert user employing publicly available tools. 

Accuracy 

The accuracy of the structures generated by homology modeling is highly dependent on the 
sequence identity between target and template. Above 50% sequence identity, models tend 
to be reliable, with only minor errors in side chain packing and rotameric state, and an 
overall RMSD between the modeled and the experimental structure falling around 1 A. This 
error is comparable to the typical resolution of a structure solved by NMR. In the 30-50% 
identity range, errors can be more severe and are often located in loops. Below 30% 
identity, serious errors occur, sometimes resulting in the basic fold being mis-predicted. 
This low-identity region is often referred to as the "twilight zone" within which homology 
modeling is extremely difficult, and to which it is possibly less suited than fold recognition 
methods. [24] 
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At high sequence identities, the primary source of error in homology modeling derives from 
the choice of the template or templates on which the model is based, while lower identities 
exhibit serious errors in sequence alignment that inhibit the production of high-quality 
models. It has been suggested that the major impediment to quality model production is 
inadequacies in sequence alignment, since "optimal" structural alignments between two 
proteins of known structure can be used as input to current modeling methods to produce 
quite accurate reproductions of the original experimental structure. ^ 

Attempts have been made to improve the accuracy of homology models built with existing 
methods by subjecting them to molecular dynamics simulation in an effort to improve their 
RMSD to the experimental structure. However, current force field parameterizations may 
not be sufficiently accurate for this task, since homology models used as starting structures 
for molecular dynamics tend to produce slightly worse structures. Slight improvements 
have been observed in cases where significant restraints were used during the 
simulation. ] 

Sources of error 

The two most common and large-scale sources of error in homology modeling are poor 
template selection and inaccuracies in target-template sequence alignment. 
Controlling for these two factors by using a structural alignment, or a sequence alignment 
produced on the basis of comparing two solved structures, dramatically reduces the errors 
in final models; these "gold standard" alignments can be used as input to current modeling 
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methods to produce quite accurate reproductions of the original experimental structure. 
Results from the most recent CASP experiment suggest that "consensus" methods 
collecting the results of multiple fold recognition and multiple alignment searches increase 
the likelihood of identifying the correct template; similarly, the use of multiple templates in 
the model-building step may be less optimal than the use of the single correct template but 
more optimal than the use of a single suboptimal one. Alignment errors may be 
minimized by the use of a multiple alignment even if only one template is used, and by the 
iterative refinement of local regions of low similarity. A lesser source of model errors 

are errors in the template structure. The http://swift.cmbi.ru.nl/gv/pdbreport/ PDBREPORT 
database lists several million, mostly very small but occasionally dramatic, errors in 
experimental (template) structures that have been deposited in the PDB. 

Serious local errors can arise in homology models where an insertion or deletion mutation 
or a gap in a solved structure result in a region of target sequence for which there is no 
corresponding template. This problem can be minimized by the use of multiple templates, 
but the method is complicated by the templates' differing local structures around the gap 
and by the likelihood that a missing region in one experimental structure is also missing in 
other structures of the same protein family. Missing regions are most common in loops 
where high local flexibility increases the difficulty of resolving the region by 
structure-determination methods. Although some guidance is provided even with a single 
template by the positioning of the ends of the missing region, the longer the gap, the more 
difficult it is to model. Loops of up to about 9 residues can be modeled with moderate 
accuracy in some cases if the local alignment is correct. Larger regions are often modeled 
individually using ab initio structure prediction techniques, although this approach has met 
with only isolated success. 
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The rotameric states of side chains and their internal packing arrangement also present 
difficulties in homology modeling, even in targets for which the backbone structure is 
relatively easy to predict. This is partly due to the fact that many side chains in crystal 
structures are not in their "optimal" rotameric state as a result of energetic factors in the 
hydrophobic core and in the packing of the individual molecules in a protein crystal. One 
method of addressing this problem requires searching a rotameric library to identify locally 
low-energy combinations of packing states. It has been suggested that a major reason 
that homology modeling so difficult when target-template sequence identity lies below 30% 
is that such proteins have broadly similar folds but widely divergent side chain packing 
arrangements. 

Utility 

Uses of the structural models include protein-protein interaction prediction, protein-protein 
docking, molecular docking, and functional annotation of genes identified in an organism's 
genome. Even low-accuracy homology models can be useful for these purposes, because 
their inaccuracies tend to be located in the loops on the protein surface, which are normally 
more variable even between closely related proteins. The functional regions of the protein, 
especially its active site, tend to be more highly conserved and thus more accurately 
modeled. [9] 

Homology models can also be used to identify subtle differences between related proteins 
that have not all been solved structurally. For example, the method was used to identify 
cation binding sites on the Na + /K + ATPase and to propose hypotheses about different 

TOO] 

ATPases' binding affinity. Used in conjunction with molecular dynamics simulations, 
homology models can also generate hypotheses about the kinetics and dynamics of a 
protein, as in studies of the ion selectivity of a potassium channel. Large-scale 
automated modeling of all identified protein-coding regions in a genome has been 
attempted for the yeast Saccharomyces cerevisiae, resulting in nearly 1000 quality models 
for proteins whose structures had not yet been determined at the time of the study, and 
identifying novel relationships between 236 yeast proteins and other previously solved 

T351 

structures. 

See also 

• Protein structure prediction 

• Protein structure prediction software 

• Protein threading 
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Loop modeling 

Loop modeling is a problem in protein structure prediction requiring the prediction of the 
conformations of loop regions in proteins without the use of a structural template. The 
problem arises often in homology modeling, where the tertiary structure of an amino acid 
sequence is predicted based on a sequence alignment to a template, or a second sequence 
whose structure is known. Because loops have highly variable sequences even within a 
given structural motif or protein fold, they often correspond to unaligned regions in 
sequence alignments; they also tend to be located at the solvent-exposed surface of 
globular proteins and thus are more conformationally flexible. Consequently, they often 
cannot be modeled using standard homology modeling techniques. More constrained 
versions of loop modeling are also used in the data fitting stages of solving a protein 
structure by X-ray crystallography, because loops can correspond to regions of low electron 
density and are therefore difficult to resolve. 

Regions of a structural model that were predicted by loop modeling tend to be much less 
accurate than regions that were predicted using template-based techniques. The extent of 
the inaccuracy increases with the number of amino acids in the loop. The loop amino acids' 
side chains dihedral angles are often approximated from a rotamer library, but can worsen 
the inaccuracy of side chain packing in the overall model. Andrej Sali's homology modeling 
suite MODELLER includes a facility explicitly designed for loop modeling by a satisfaction 
of spatial restraints method. 

Short loops 

In general, the most accurate predictions are for loops of fewer than 8 amino acids. 
Extremely short loops of three residues can be determined from geometry alone, provided 
that the bond lengths and bond angles are specified. Slightly longer loops are often 
determined from a "spare parts" approach, in which loops of similar length are taken from 
known crystal structures and adapted to the geometry of the flanking segments. In some 
methods, the bond lengths and angles of the loop region are allowed to vary, in order to 
obtain a better fit; in other cases, the constraints of the flanking segments may be varied to 
find more "protein-like" loop conformations. The accuracy of such short loops may be 
almost as accurate as that of the homology model upon which it is based. It should also be 
considered that the loops in proteins may not be well-structured and therefore have no one 
conformation that could be predicted; NMR experiments indicate that solvent-exposed 
loops are "floppy" and adopt many conformations, while the loop conformations seen by 
X-ray crystallography may merely reflect crystal packing interactions, or the stabilizing 
influence of crystallization co-solvents. 
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External links 
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• MODLOOP L , public server for access to MODELLER'S loop modeling facility 
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MODELLER is a computer program used in producing homology models of protein tertiary 
structures as well as quaternary structures (rarer). It implements a technique inspired by 
nuclear magnetic resonance known as satisfaction of spatial restraints, by which a set of 
geometrical criteria are used to create a probability density function for the location of 
each atom in the protein. The method relies on an input sequence alignment between the 
target amino acid sequence to be modeled and a template protein whose structure has been 
solved. 

The program also incorporates limited functionality for ab initio structure prediction of loop 
regions of proteins, which are often highly variable even among homologous proteins and 
therefore difficult to predict by homology modeling. 

MODELLER was originally written and is currently maintained by Andrej Sali at the 
University of California, San Francisco. Although it is freely available for academic use, 
graphical user interfaces and commercial versions are distributed by Accelrys. 

External links 

• MODELLER [1] 
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Molecular models of DNA 

Molecular models of DNA structures are representations of the molecular geometry and 
topology of Deoxyribonucleic acid (DNA) molecules using one of several means, such as: 
closely packed spheres (CPK models) made of plastic, metal wires for 'skeletal models', 
graphic computations and animations by computers, artistic rendering, and so on, with the 
aim of simplifying and presenting the essential, physical and chemical, properties of DNA 
molecular structures either in vivo or in vitro. Computer molecular models also allow 
animations and molecular dynamics simulations that are very important for understanding 
how DNA functions in vivo. Thus, an old standing dynamic problem is how DNA 
"self-replication" takes place in living cells that should involve transient uncoiling of 
supercoiled DNA fibers. Although DNA consists of relatively rigid, very large elongated 
biopolymer molecules called "fibers" or chains (that are made of repeating nucleotide units 
of four basic types, attached to deoxyribose and phosphate groups), its molecular structure 
in vivo undergoes dynamic configuration changes that involve dynamically attached water 
molecules and ions. Supercoiling, packing with histones in chromosome structures, and 
other such supramolecular aspects also involve in vivo DNA topology which is even more 
complex than DNA molecular geometry, thus turning molecular modeling of DNA into an 
especially challenging problem for both molecular biologists and biotechnologists. Like 
other large molecules and biopolymers, DNA often exists in multiple stable geometries (that 
is, it exhibits conformational isomerism) and configurational, quantum states which are 
close to each other in energy on the potential energy surface of the DNA molecule. Such 
geometries can also be computed, at least in principle, by employing ab initio quantum 
chemistry methods that have high accuracy for small molecules. Such quantum geometries 
define an important class of ab initio molecular models of DNA whose exploration has 
barely started. 

In an interesting twist of roles, the DNA molecule itself was proposed to 
be utilized for quantum computing. Both DNA nanostructures as well as 
DNA 'computing' biochips have been built (see biochip image at right). 

The more advanced, computer-based molecular models of DNA involve 
molecular dynamics simulations as well as quantum mechanical 
computations of vibro-rotations, delocalized molecular orbitals (MOs), 
electric dipole moments, hydrogen-bonding, and so on. 



DNA computing 
biochip :3D 
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Importance 

From the very early stages of structural studies of DNA by X-ray 
diffraction and biochemical means, molecular models such as the 
Watson-Crick double-helix model were successfully employed to solve the 
'puzzle' of DNA structure, and also find how the latter relates to its key 
functions in living cells. The first high quality X-ray diffraction patterns 
of A-DNA were reported by Rosalind Franklin and Raymond Gosling in 
1953 . The first calculations of the Fourier transform of an atomic helix 
were reported one year earlier by Cochran, Crick and Vand , and were 
followed in 1953 by the computation of the Fourier transform of a 
coiled-coil by Crick [ ^ . The first reports of a double-helix molecular 
model of B-DNA structure were made by Watson and Crick in 1953 . 

Last-but-not-least, Maurice F. Wilkins, A. Stokes and H.R. Wilson, 
reported the first X-ray patterns of in vivo B-DNA in partially oriented 
salmon sperm heads [ ] . The development of the first correct 
double-helix molecular model of DNA by Crick and Watson may not have 

been possible without the biochemical evidence for the nucleotide base-pairing ([A— T]; 

[C-G]), or Chargaff's rules [7] [8] [9] [10] [11] [12] . 




Spinning DNA 
generic model. 



Examples of DNA molecular models 

Animated molecular models allow one to visually explore the three-dimensional (3D) 
structure of DNA. The first DNA model is a space-filling, or CPK, model of the DNA 
double-helix whereas the third is an animated wire, or skeletal type, molecular model of 
DNA. The last two DNA molecular models in this series depict quadruplex DNA that 
may be involved in certain cancers . The last figure on this panel is a molecular 

model of hydrogen bonds between water molecules in ice that are similar to those found in 
DNA. 
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Hydrogen 
bonds 




• Spacefilling model or CPK model - a molecule is represented by overlapping spheres 
representing the atoms. 
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DNA Spacefilling molecular model 



Images for DNA Structure Determination from X-Ray 
Patterns 

The following images illustrate both the principles and the main steps involved in 
generating structural information from X-ray diffraction studies of oriented DNA fibers with 
the help of molecular models of DNA that are combined with crystallographic and 
mathematical analysis of the X-ray patterns. From left to right the gallery of images shows: 

• First row. 

• 1. Constructive X-ray interference, or diffraction, following Bragg's Law of X-ray 
"reflection by the crystal planes"; 

• 2. A comparison of A-DNA (crystalline) and highly hydrated B-DNA (paracrystalline) X-ray 
diffraction, and respectively, X-ray scattering patterns (courtesy of Dr. Herbert R. Wilson, 
FRS- see refs. list); 

• 3. Purified DNA precipitated in a water jug; 

• 4. The major steps involved in DNA structure determination by X-ray crystallography 
showing the important role played by molecular models of DNA structure in this iterative, 
structure-determination process; 

• Second row. 
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• 5. Photo of a modern X-ray diffractometer employed for recording X-ray patterns of DNA 
with major components: X-ray source, goniometer, sample holder, X-ray detector and/or 
plate holder; 

• 6. Illustrated animation of an X-ray goniometer; 

• 7. X-ray detector at the SLAC synchrotron facility; 

• 8. Neutron scattering facility at ISIS in UK; 

• Third and fourth rows: Molecular models of DNA structure at various scales; figure 
#11 is an actual electron micrograph of a DNA fiber bundle, presumably of a single 
bacterial chromosome loop. 
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Paracrystalline lattice models of B-DNA structures 

A paracrystalline lattice, or paracrystal, is a molecular or atomic lattice with significant 
amounts (e.g., larger than a few percent) of partial disordering of molecular 
arranegements. Limiting cases of the paracrystal model are nanostructures, such as 
glasses, liquids, etc., that may possess only local ordering and no global order. Liquid 
crystals also have paracrystalline rather than crystalline structures. 
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DNA Helix controversy in 1952 
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Highly hydrated B-DNA occurs naturally in living cells in such a paracrystalline state, which 
is a dynamic one in spite of the relatively rigid DNA double-helix stabilized by parallel 
hydrogen bonds between the nucleotide base-pairs in the two complementary, helical DNA 
chains (see figures). For simplicity most DNA molecular models ommit both water and ions 
dynamically bound to B-DNA, and are thus less useful for understanding the dynamic 
behaviors of B-DNA in vivo. The physical and mathematical analysis of X-ray and 

spectroscopic data for paracrystalline B-DNA is therefore much more complicated than that 
of crystalline, A-DNA X-ray diffraction patterns. The paracrystal model is also important for 
DNA technological applications such as DNA nanotechnology. Novel techniques that 
combine X-ray diffraction of DNA with X-ray microscopy in hydrated living cells are now 
also being developed (see, for example, "Application of X-ray microscopy in the analysis of 

ri oi 

living hydrated cells" ). 

Genomic and Biotechnology Applications of DNA molecular 
modeling 

The following gallery of images illustrates various uses of DNA molecular modeling in 
Genomics and Biotechnology research applications from DNA repair to PCR and DNA 
nanostructures; each slide contains its own explanation and/or details. The first slide 
presents an overview of DNA applications, including DNA molecular models, with emphasis 
on Genomics and Biotechnology. 

Gallery: DNA Molecular modeling applications 
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Databases for DNA molecular models and sequences 



X-ray diffraction 

• NDB ID: UD0017 Database [13] 

• X-ray Atlas -database [19] 

• PDB files of coordinates for nucleic acid structures from X-ray diffraction by NA (incl. 



DNA) crystals 



[20] 



• Structure factors dowloadable files in CIF format 
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Neutron scattering 

• ISIS neutron source 

• ISIS pulsed neutron source:A world centre for science with neutrons & muons at 
Harwell, near Oxford, UK. [22] 



X-ray microscopy 

• Application of X-ray microscopy in the analysis of living hydrated cells 



[18] 



Electron microscopy 

• DNA under electron microscope 



[23] 



Atomic Force Microscopy (AFM) 

Two-dimensional DNA junction arrays have been visualized by Atomic Force Microscopy 
(AFM) . Other imaging resources for AFM/Scanning probe microscopy(SPM) can be 
freely accessed at: 

• How SPM Works [25] 

• SPM Image Gallery - AFM STM SEM MFM NSOM and more. [26] 

Gallery of AFM Images 
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Mass spectrometry— Maldi informatics 



Data acquisition 



I List of peak 
I masses 




Peak detection 




_ 5 J List of peak 
^n intensities 




Genotype, 
mutations, etc. 
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Spectroscopy 

• Vibrational circular dichroism (VCD) 

• FT-NMR [27] [28] 

• NMR Atlas-database [29] 

• mmcif downloadable coordinate files of nucleic acids in solution from 2D-FT NMR data 

[30] 

• NMR constraints files for NAs in PDB format [31] 
NMR microscopy 1 ' 
Microwave spectroscopy 
FT-IR 

FT . NIR [33] [34] [35] 

Spectral Hyperspectral, and Chemical imaging) [36] [37] [38] [39] [40] [41] [42] . 
Raman spectroscopy/microscopy and CARS 

Fluorescence correlation spectroscopy' 451 [46] [47] [48] [49] [50] [51] [52] , Fluorescence 
cross-correlation spectroscopy and FRET 



Confocal microscopy 



[56] 
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Gallery: CARS (Raman spectroscopy), Fluorescence confocal 
microscopy, and Hyperspectral imaging 
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Genomic and structural databases 

• CBS Genome Atlas Database — contains examples of base skews. 

• The Z curve database of genomes — a 3-dimensional visualization and analysis tool of 
genomes [59][60] . 

• DNA and other nucleic acids' molecular models: Coordinate files of nucleic acids 
molecular structure models in PDB and CIF formats 
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See also 

DNA 

Molecular graphics 

DNA structure 

DNA Dynamics 

X-ray scattering 

Neutron scattering 

Crystallography 

Crystal lattices 

Paracrystalline lattices/Paracrystals 

2D-FT NMRI and Spectroscopy 

NMR Spectroscopy 

Microwave spectroscopy 

Two-dimensional IR spectroscopy 

Spectral imaging 

Hyperspectral imaging 

Chemical imaging 

NMR microscopy 

VCD or Vibrational circular dichroism 

FRET and FCS- Fluorescence correlation spectroscopy 

Fluorescence cross-correlation spectroscopy (FCCS) 

Molecular structure 

Molecular geometry 

Molecular topology 

DNA topology 

Sirius visualization software 

Nanostructure 

DNA nanotechnology 

Imaging 

Atomic force microscopy 

X-ray microscopy 

Liquid crystal 

Glasses 

QMC@Home 
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External links 

DNA the Double Helix Game (http://nobelprize.org/educational_games/medicine/ 

dnadoublehelix/) From the official Nobel Prize web site 

MDDNA: Structural Bioinformatics of DNA (http://humphry.chem. wesleyan.edu:8080/ 

MDDNA/) 

Double Helix 1953-2003 (http://www.ncbe.reading.ac.uk/DNA50/) National Centre 

for Biotechnology Education 

DNA under electron microscope (http://www.fidelitysystems.com/Unlinked_DNA. 

html) 

Ascalaph DNA (http://www.agilemolecule.com/Ascalaph/Ascalaph_DNA.html) — 

Commercial software for DNA modeling 

DNAlive: a web interface to compute DNA physical properties (http://mmb.pcb.ub.es/ 

DNAlive). Also allows cross-linking of the results with the UCSC Genome browser and 

DNA dynamics. 

DiProDB: Dinucleotide Property Database (http://diprodb.fli-leibniz.de). The database 

is designed to collect and analyse thermodynamic, structural and other dinucleotide 

properties. 

Further details of mathematical and molecular analysis of DNA structure based on X-ray 

data (http://planetphysics.org/encyclopedia/ 

BesselFunctionsApplicationsToDiffractionByHelicalStructures.html) 

Bessel functions corresponding to Fourier transforms of atomic or molecular helices. 

(http://planetphysics.org/?op=getobj&from=objects& 

name=BesselFunctionsAndTheirApplicationsToDiffractionByHelicalStructures) 

Application of X-ray microscopy in analysis of living hydrated cells (http://www.ncbi. 

nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract& 

list_uids=12379938) 

Characterization in nanotechnology some pdfs (http://nanocharacterization.sitesled. 

com/) 

overview of STM/AFM/SNOM principles with educative videos (http://www.ntmdt.ru/ 

SPM-Techniques/Principles/) 

SPM Image Gallery - AFM STM SEM MFM NSOM and More (http://www.rhk-tech.com/ 

results/showcase. php) 

How SPM Works (http://www.parkafm.com/New_html/resources/01general.php) 

U.S. National DNA Day (http://www.genome.gov/10506367) — watch videos and 

participate in real-time discusssions with scientists. 

The Secret Life of DNA - DNA Music compositions (http://www.tjmitchell.com/stuart/ 

dna.html) 
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List of nucleic acid simulation 
software 
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This is a list of computer programs that are used for nucleic acids simulations. 

Min - Optimization, MD - Molecular Dynamics, MC - Monte Carlo, 

Crt - Cartesian coordinates. Int - Internal coordinates Exp - Explicit water. Imp - Implicit 
water. 

Lig - Ligands interactions. HA - Hardware accelerated. 
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See also 

Molecular Modelling 

Molecular graphics 

Molecular mechanics 

Molecular dynamics 

Molecular Design software 

Quantum chemistry computer programs 

List of RNA structure prediction software 

List of protein structure prediction software 

List of software for molecular mechanics modeling 

Force field 

Force field implementation 
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Folding@home 




The PlayStation 3 Folding@home client displays a 3D model of the protein being simulated 



Original author(s) 



Vijay Pande 



Developer(s) 
Initial release 



Stanford University / Pande Group 
2000-10-01 



Stable release 



Windows: 

6.23 (Uniprocessor) 

6.23 (GPU) 

Mac OS X: 

6.20 (PPC-Uniprocessor) 

6.20 (x86-SMP) 

Linux: 

6.02 (Uniprocessor) 

6.02 (x64-SMP) 

PlayStation 3: 1.4 [1] 

/ 2008-11-26 (Windows 6.23) 



Preview release 



Platform 



6.23beta (Windows SMP) 
6.24beta (Linux x64-SMP) 
6.24beta (Mac OS X x86-SMP) 
/ 2009-01-20 (6.24betas) 

Cross-platform 



Available in 
Type 



English 

Distributed computing 



License 



Proprietary [2] 



Website 



folding.stanford.edu 



[3] 



Folding@home (sometimes abbreviated as FAH or F@h) is a distributed computing (DC) 
project designed to perform computationally intensive simulations of protein folding and 
other molecular dynamics (MD). It was launched on October 1, 2000, and is currently 
managed by the Pande Group, within Stanford University's chemistry department, under 
the supervision of Professor Vijay Pande. Folding@home is the most powerful distributed 
computing cluster in the world, according to Guinness, and one of the world's largest 
distributed computing projects. The goal of the project is "to understand protein folding, 
misfolding, and related diseases." 
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Purpose 

Accurate simulations of protein folding and misfolding enable the scientific community to 
better understand the development of many diseases, including sickle-cell disease 
(drepanocytosis), Alzheimer's disease, Parkinson's disease, mad cow disease, cancer, 
Huntington's disease, cystic fibrosis, osteogenesis imperfecta, alpha 1 -antitrypsin 

T71 

deficiency, and other aggregation-related diseases. More fundamentally, understanding 
the process of protein folding — how biological molecules assemble themselves into a 
functional state — is one of the outstanding problems of molecular biology. So far, the 
Folding@home project has successfully simulated folding in the 5-10 microsecond range — 
which is a far longer simulation than it was previously thought possible to model. The 
Pande Group goal is to refine and improve the MD and Folding@home DC methods to the 
level where it will become an essential tool for the MD research. For that goal they 
collaborate with various scientific institutions. As of February 19, 2009, sixty-three 
scientific research papers have been published using the project's work. A University of 
Illinois at Urbana-Champaign report dated October 22, 2002 states that Folding@home 
distributed simulations of protein folding are demonstrably accurate. 
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Function 

Folding@home does not rely on powerful 
supercomputers for its data processing; 
instead, the primary contributors to the 
Folding@home project are many hundreds 
of thousands of personal computer users 
who have installed a small client program. 
The client will, at the user's choice, run in 
the background, utilizing otherwise unused 
CPU power, or run as a Screensaver only 
while the user is away. In most modern 
personal computers, the CPU is rarely used 
to its full capacity at all times; the 
Folding@home client takes advantage of 
this unused processing power. 
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Folding@home when running takes advantage of 

unused CPU cycles on a computer system as shown by 

this computer's 99% CPU usage. 



The Folding@home client periodically 

connects to a server to retrieve "work 

units", which are packets of data upon 

which to perform calculations. Each 

completed work unit is then sent back to the server. As data integrity is a major concern for 

all distributed computing projects, all work units are validated through the use of a 2048 bit 

digital signature. 

Contributors to Folding@home may have user names used to keep track of their 
contributions. Each user may be running the client on one or more CPUs; for example, a 
user with two computers could run the client on both of them. Users may also contribute 
under one or more team names; many different users may join together to form a team. 
Contributors are assigned a score indicating the number and difficulty of completed work 
units. Rankings and other statistics are posted to the Folding@home website. 
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Analysis Software 

The Folding@home client utilizes modified versions of five molecular simulation programs 
for calculation: TINKER, GROMACS, AMBER CPMD, and SHARPEN. [12] Where possible, 
optimizations are used to speed the process of calculation. There are many variations on 
these base simulation programs, each of which is given an arbitrary identifier (Core xx): 



Active Cores 

• GROMACS (all variants of this core use SIMD optimizations including SSE, 3DNow+ or 
AltiVec, where available, unless otherwise specified) 

• Gromacs (Core 78) 

• Available for all Uniprocessor clients only. 

• DGromacs (Core 79) 

Double precision Gromacs, uses SSE2 only. 
Available for all Uniprocessor clients only. 
DGromacsB (Core 7b) 

Nominally an update of DGromacs, but is actually based on the SMP/GPU codebases 
(and is therefore a completely new core). As a result, both are still in use. 
Double precision Gromacs, uses SSE2 only. 
Available for all Uniprocessor clients only. 
DGromacsC (Core 7c) 

Double precision Gromacs, uses SSE2 only. 
Available on Windows and Linux Uniprocessor clients only. 
GBGromacs (Core 7a) 

Gromacs with the Generalized Born implicit solvent model. 
Available for all Uniprocessor clients only. 
Gromacs SREM (Core 80) 

Gromacs Serial Replica Exchange Method. 

The Gromacs Serial Replica Exchange Method core, also known as GroST (Gromacs 
Serial replica exchange with Temperatures), uses the Replica Exchange method 
(also known as REMD or Replica Exchange Molecular Dynamics) in its simulations. 
Available for Windows and Linux Uniprocessor clients only. 
GroSimT (Core 81) 

Gromacs with Simulated Tempering. 

Available for Windows and Linux Uniprocessor clients only. 
Gromacs 33 (Core aO) 

Uses the Gromacs 3.3 codebase. 
Available for all Uniprocessor clients only. 
Gro-SMP (Core al) 

Symmetric Multiprocessing variant, locked to four threads (but can be run on dual 
core processors). 

Runs only on multi-core x86 or x64 hardware, uses SSE only. 
Available for all SMP clients only. 
GroCVS (Core a2) 

• Symmetric Multiprocessing variant with scalable numbers of threads. 
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• Runs only on multi-core x86 or x64 hardware, with four or more cores, uses SSE 
only. 

• Uses the Gromacs 4.0 codebase. 

• Available for Linux and Mac OS X SMP clients only. 

• GroGPU2 (Core 11) 

• Graphics Processing Unit variant for ATI 
CAL-enabled and nVidia CUDA-enabled GPUs. 

• Comes in two separate versions, one each for 
ATI and nVidia, but both have the same Core ID. 

• GPUs do not support SIMD optimizations by 
design, so none are used in this core. 

• Available for GPU2 client only. 

J NVIDIA GPU v2.0 rl client for 

• ATI-DEV(Corel2) Windows.. 




• Graphics Processing Unit developmental core for ATI CAL-enabled GPUS. 

• Does not support SIMD optimizations. 

• Available for GPU2 client only. 

• NVIDIA-DEV (Core 13) 

• Graphics Processing Unit developmental core for nVidia CUDA-enabled GPUs. 

• Does not support SIMD optimizations. 

• Available for GPU2 client only. 

• GroGPU2-MT (Core 14) [14] 

• Graphics Processing Unit variant for nVidia CUDA-enabled GPUs. 

• Contains additional debugging code compared to the standard Core 11. 

• Does not support SIMD optimizations. 

• Released March 2, 2009. 

• Available for GPU2 client only. 

• Gro-PS3 (Does not have a known ID number, but also called SCEARD core) 

• PlayStation 3 variant. 

• No SIMD optimizations, uses SPE cores for optimization. 

• Available for PS 3 client only. 
AMBER 

• PMD (Core 82) [13] 

• No optimizations. 

• Available for Windows and Linux Uniprocessor clients only. 
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Inactive Cores 

• TINKER 

• Tinker core (Core 65) 

• Currently inactive, as the GBGromacs core (Core 7a) performs the same tasks much 
faster. 

• No optimizations. 

• Available for all Uniprocessor clients only. 

• GROMACS 

• GroGPU (Core 10) 

• Graphics Processing Unit variant for ATI series lxxx GPUs. 

• GPUs do not have optimizations; no SIMD optimizations needed since GPU cores are 
explicitly designed for SIMD. 

• Inactive as of June 6, 2008 due to end of distribution of GPU1 client units. 

• Available for GPU1 client only. 

• CPMD 

• QMD (Core 96) 

• Currently inactive, due to QMD developer graduating from Stanford University and 
due to current research shifting away from Quantum MD. 

• Caused controversy due to SSE2 issues involving Intel libraries and AMD 

r 1 ri 

processors. 

• Uses SSE2 (currently only on Intel CPUs, see above). 

• Available for Windows and Linux Uniprocessor clients only. 

• SHARPEN [16] 

• SHARPEN Core [17] 

• Currently inactive, in closed beta testing before general release. 

• Uses different format to standard F@H cores, as there is more than one "Work Unit" 
(using the normal definition) in each work packet sent to clients. 
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Possible future additions 

• ProtoMol [9] 



Participation 

Shortly after breaking the 200,000 active 
CPU count on September 20, 2005, the 
Folding@home project celebrated its fifth 
anniversary on October 1, 2005. 

Interest and participation in the project 
has grown steadily since its launch. The 
number of active devices participating in 
the project increased substantially after 
receiving much publicity during the 
launch of their high performance clients 
for both ATi graphics cards and the 
PlayStation 3, and again following the 
launch of the high performance client for 
nVidia graphics cards. 
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Folding@home computing power shown - by device type 
- in TeraFLOPS as recorded semi-daily from November 

2006 until September 2007. Note the large spike in total 
compute power after March 22, when the PlayStation 3 
client was released. 



As of April 9, 2009 the peak speed of the 

project overall has reached over 4.5 

native PFLOPS (8.1 x86 PFLOPS [18] ) from around 400,000 active machines, and the project 

has received computational results from over 3.75 million devices since it first started. 



Google & Folding@home 

There used to be cooperation between Folding@home and Google Labs in the form of 
Google Toolbar. Google Compute supported Folding@home during its early stage — when 
Folding@home had -10,000 active CPUs. At that time, a boost of 20,000 machines was 
very significant. Today the project has a large number of active CPUs and the number of 
new clients joining Google Compute was very low (most people opted for the 
Folding@home client instead), so it was discontinued. The Google Compute clients also had 
certain limits: they could only run the TINKER core and had limited naming and team 
options. Folding@home is no longer supported on Google Toolbar, and even the old Google 
Toolbar client will not work. 



Genome@home 

Folding@home absorbed the Genome@home project on March 8, 2004. The work which 
was started by the Genome@home project has since been completed using the 
Folding@home network (the work units without deadlines), and no new work is being 
distributed by this project. All donators were encouraged to download the Folding@home 
client (the F@h 4.xx client had a Genome@home option), and once the Genome@home 
work was complete these clients were asked to donate their processing power to the 
Folding@home project instead. 
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PetaFLOPS Milestones 



Native petaFLOPS Barrier 


Date Crossed 


1.0 


September 16, 2007 


2.0 


early May 2008 


3.0 


August 20, 2008 


4.0 


September 28, 2008 


5.0 


February 18, 2009 



On September 16, 2007, the Folding@home project officially attained a sustained 
performance level higher than one native petaFLOPS, becoming the first computing system 
of any kind in the world to ever do so, although it had briefly peaked above one native 
petaFLOPS in March 2007, receiving a large amount of main stream media coverage for 

[201 T211 

doing so. In early May 2008 the project attained a sustained performance level 

higher than two native petaFLOPS, followed by the three and four native petaFLOPS 
milestones on August 20 and September 28, 2008 respectively. On February 18, 2009, 
Folding@home achieved a performance level of 5033 native TFLOPS, thereby becoming the 

[221 

first computing system of any kind to surpass 5 native PFLOPS , just as it was for the 
other four milestones. 

The Folding@home computing cluster currently operates at above 4.5 native petaFLOPS at 
all times, with a large majority of the performance coming from GPU and PlayStation 3 

r cr "I 

clients. In comparison to this, the fastest standalone supercomputer (non-distributive 
computing) in the world (as of November 2008, U.S. Department of Energy Roadrunner) 
peaks at approximately 1.46 petaFLOPS/ ' 

Beginning in April 2009, Folding@Home began reporting performance in both "Native" 
FLOPS and x86 FLOPS. [5] ("x86" FLOPS reported at a much higher mark than the "Native" 
FLOPS) A detailed explanation of the difference between the two figures was given in the 
FLOP section of the Folding@Home FAQ. 
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Results 

These peer-reviewed papers (in chronological order) all use research from the 



Folding@home project. 



[10] 
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• Michael R. Shirts and Vijay S. Pande (2001). "Mathematical Analysis of Coupled Parallel 
Simulations". Physical Review Letters 86 (22): 4983-4987. 
doi:10.1103/PhysRevLett.86.4983 [26] . 

• Bojan Zagrovic, Eric J. Sorin and Vijay Pande (2001). "b-Hairpin Folding Simulations in 
Atomistic Detail Using an Implicit Solvent Model". Journal of Molecular Biology 313: 
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High performance platforms 

Graphical processing units 

On October 2, 2006, the Folding@home Windows GPU client was released to the public as a 
beta test. After 9 days of processing from the Beta client the Folding@home project had 
received 31 teraFLOPs of computational performance from just 450 ATI Radeon X1900 
GPUs, averaging at over 7 Ox the performance of current CPU submissions, and the GPU 
clients remain the most powerful clients available in terms of performance per client (as of 
March 11, 2009, GPU clients accounted for over 60% of the entire project's throughput at 
an approximate ratio of 9 clients per teraFLOP— nVidia clients currently lead ATI clients in 
overall contribution and in performance per client). ' On April 10, 2008, the second 
generation Windows GPU client was released to open beta testing, supporting ATI/AMD's 
Radeon HD 2000 and HD 3000 series, and also debuting a new core (GROGPU2 - Core 11). 
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Inaccuracies with DirectX were cited as the main reason for the migration to the new 

roil 

version (the original GPU client was officially retired June 6, 2008 ), which uses 
AMD/ATI's CAL. On June 17, 2008, a version of the second-generation Windows GPU client 
for CUDA enabled Nvidia GPUs was also released for public beta testing. The GPU 
clients proved reliable enough to be promoted out of the beta phase and were officially 
released August 1, 2008. Newer GPU cores continue to be released for both CAL and 
CUDA. No word has to date been given over future support for OpenCL or DirectX ll's 
Compute Shaders. 

While the only officially released GPU v2.0 client is for Windows, this client can be run on 
Linux under Wine with NVIDIA graphics cards. The client can operate on both 32- and 
64-bit Linux platforms, but in either case the 32-bit CUDA toolkit is required. This 
configuration is not officially supported, though initial results have shown comparable 
performance to that of the native client and no problems with the scientific results have 
been found . An unofficial installation guide has been published. ] 

PlayStation 3 

Stanford announced in August 2006 that a 

folding client was available to run on the Sony 

PlayStation 3. The intent was that gamers 

would be able to contribute to the project by 

merely "contributing electricity", leaving their 

PlayStation 3 consoles running the client while 

not playing games. PS3 firmware version 1.6 

(released on Thursday, March 22, 2007) allows 

for Folding@home software, a 50 MB download, The PlayStation 3's Life With PlayStation client 

to be used on the PS3. [5] A peak output Of the replaced the Folding@home application on 18 

_ T _„ ,. , ^_ September, 2008. 

project at 990 teraFLOPS was achieved on 25 

March, 2007, at which time the number of 

FLOPS from each PS3 as reported by Stanford fell, reducing the overall speed rating of 
those machines by 50%. This had the effect of bumping down the overall project speed to 
the mid 700 range and increasing the number of active PS3s required to achieve a 
petaFLOPS level to around 60,000. 

On April 26, 2007, Sony released a new version of Folding@home which improved folding 
performance drastically, such that the updated PS3 clients produced 1500 teraFLOPS with 
52,000 clients versus the previous 400 teraFLOPS by around 24,000 clients. [86] Lately, the 
console accounts for around 26% of all teraFLOPS at an approximate ratio of 35V2 PS3 
clients per teraFLOPS. 

On December 19, 2007, Sony again updated the Folding@home client to version 1.3 to 
allow users to run music stored on their hard drives while contributing. Another feature of 
the 1.3 update allows users to automatically shut down their console after current work is 
done or after a limited period of time (for example 3 or 4 hours). Also, the software update 
added the Generalized Born implicit solvent model, so the FAH PS 3 client gained more 
broad computing capabilities. Shortly afterward, 1.3.1 was released to solve a 

mishandling of protocol resulting in difficulties sending and receiving Work Units due to 
heavy server loads stemming from the fault. 
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On 18 September, 2008 the Folding@home client became Life With PlayStation. In addition 
to the existing functionality, the application also provides the user with access to 
information "channels", the first of which being the Live Channel which offers news 
headlines and weather through a 3D globe. The user can rotate and zoom in to any part of 
the world to access information provided by Google News and The Weather Channel, 
among other sources, all running whilst folding in the background. This update also 
provided more advanced simulation of protein folding and a new ranking system. ] 



Multi-core processing client 

As more modern CPUs are being released, the migration to 
multiple cores is becoming more adopted by the public, and 
the Pande Group is adding symmetric multiprocessing (SMP) 
support to the Folding@home client in the hopes of 
capturing the additional processing power. The SMP support 
is being achieved by utilizing Message Passing Interface 
protocols. In current state it is being confined inside a single 
node by hard coded usage of the localhost. 

On November 13, 2006, the beta SMP Folding@home clients 
for x86-64 Linux and x86 Mac OS X were released. The beta 
Win32 SMP Folding@home client is out as well, and a 32-bit 
Linux client is currently in development. 
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Folding@home teams 

A typical Folding@home user, running the client on a single PC, will likely not be ranked 
high on the list of contributors. However, if the user were to join a team, they would add 
the points they receive to a larger collective. Teams work by using the combined score of all 
their members. Thus, teams are ranked much higher than individual submitters. Rivalries 
between teams create friendly competition that benefits the folding community. Many 
teams publish their own stats, so members can have intra-team competitions for top 
spots. Some teams offer prizes in an attempt to increase participation in the project. ^ 



Development 

The Folding@home project does not make the project source code available to the public, 
citing security and integrity concerns. At the same time, the majority of the scientific 

codes used by the FAH (ex. Cosm, GROMACS, TINKER, AMBER, CPMD, BrookGPU) are 
largely Open-source software or under similar licenses. 

A development version of Folding@home once ran on the open source BOINC framework; 
however, this version remained unreleased. 
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Estimated energy consumption 

A PlayStation 3 has a maximum power rating of 380 watts. As Folding@home is a CPU 
intensive application, it causes 100% utilization. However, according to Stanford's PS3 
FAQ, "We expect the PS3 to use about 200W while running Folding@home. As of 

December 27, 2008, there are 55,291 PS3s providing 1,559,000,000 MFlops of processing 
power. This amounts to 28,196 MFlops/PS3, and with Stanford's estimate of 200W per PS3 

T51 

(for original units manufactured on the 90nm process), 140.98 MFlops/watt. This would 
put the PS3 portion of Folding@home at 95th on the November 2008 Green500 list. [97] The 
Cell processors used in current units of the PlayStation 3 utilize 65nm technology (lowering 
power consumption to around 115W per PS3), with another upgrade to 45nm planned 
(further dropping consumption to around 80W/PS3). This will further increase the power 
efficiency of the contribution from PlayStation 3 units. 

The total power consumption required to produce the processing power required by the 
project can be estimated based upon the average FLOPS per watt. As of November 2008, 
according to the Green500 list, the most efficient computer - also based on a version of the 
Cell BE - runs at 536.24 MFLOPS/watt. [98] One petaFLOPS equals 1,000,000,000 MFLOPSs. 
Therefore, the current Folding@home project, if it were theoretically using the most 
efficient CPUs currently available, would use at least 2.8 megawatts of power per 
petaFLOPS, slightly more than the world's first and only petaflop system, the Cell-based 
Roadrunner which uses 2.345MW. This is equivalent to the power needed to light 
approximately 40,000 standard house light bulbs (between 60 and 100 watts each), or the 
equivalent of 0.5-3 electrical wind mills depending on their size. 

Estimates of power usage per time period are more difficult than estimates of power usage 
per processing instruction. This is because Folding@home clients are often run on 
computers that would be powered-on even in the absence of the Folding@home client, and 
that run other programs simultaneously. While Folding@home increases processor 
utilization, and thus (usually) power consumption, the extent to which it does so is 
dependent on the client processor's normal operating load, and its ability to reduce clock 
speeds when presented with less-than-full utilization (a process known as dynamic 
frequency scaling). Consequently, the total power usage of the Folding@home client on a 
temporal basis is probably less than the figure that could be calculated by summing the 
peak power consumption of each of the project's component processors. 

See also 

Blue Gene 

Grid computing 

List of distributed computing projects 

Rosetta@Home 

Software for molecular modeling 

Molecular modeling on GPU 
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Molecular Dynamics, Theories and 
Computational Methods 

Classical mechanics 

In physics, classical mechanics is one of the two major sub-fields of study in the science 
of mechanics, which is concerned with the set of physical laws governing and 
mathematically describing the motions of bodies and aggregates of bodies geometrically 
distributed within a certain boundary under the action of a system of forces. The other 
sub-field is quantum mechanics. 

Classical mechanics is used for describing the motion of macroscopic objects, from 
projectiles to parts of machinery, as well as astronomical objects, such as spacecraft, 
planets, stars, and galaxies. It produces very accurate results within these domains, and is 
one of the oldest and largest subjects in science, engineering and technology. 

Besides this, many related specialties exist, dealing with gases, liquids, and solids, and so 
on. Classical mechanics is enhanced by special relativity for objects moving with high 
velocity, approaching the speed of light; general relativity is employed to handle gravitation 
at a deeper level; and quantum mechanics handles the wave-particle duality of atoms and 
molecules. 

The term classical mechanics was coined in the early 20th century to describe the system of 
mathematical physics begun by Isaac Newton and many contemporary 17th century natural 
philosophers, building upon the earlier astronomical theories of Johannes Kepler, which in 
turn were based on the precise observations of Tycho Brahe and the studies of terrestrial 
projectile motion of Galileo, but before the development of quantum physics and relativity. 
Therefore, some sources exclude so-called "relativistic physics" from that category. 
However, a number of modern sources do include Einstein's mechanics, which in their view 
represents classical mechanics in its most developed and most accurate form. The initial 
stage in the development of classical mechanics is often referred to as Newtonian 
mechanics, and is associated with the physical concepts employed by and the mathematical 
methods invented by Newton himself, in parallel with Leibniz, and others. This is further 
described in the following sections. More abstract and general methods include Lagrangian 
mechanics and Hamiltonian mechanics. Much of the content of classical mechanics was 
created in the 18th and 19th centuries and extends considerably beyond (particularly in its 
use of analytical mathematics) the work of Newton. 
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Description of the theory 

The following introduces the basic concepts of classical 
mechanics. For simplicity, it often models real-world 
objects as point particles, objects with negligible size. 
The motion of a point particle is characterized by a 
small number of parameters: its position, mass, and the 
forces applied to it. Each of these parameters is 
discussed in turn. 

In reality, the kind of objects which classical mechanics 

can describe always have a non-zero size. (The physics 

of very small particles, such as the electron, is more 

accurately described by quantum mechanics). Objects 

with non-zero size have more complicated behavior 

than hypothetical point particles, because of the 

additional degrees of freedom— for example, a baseball 

can spin while it is moving. However, the results for point particles can be used to study 

such objects by treating them as composite objects, made up of a large number of 

interacting point particles. The center of mass of a composite object behaves like a point 

particle. 




The analysis of projectile motion is a 
part of classical mechanics. 



Position and its derivatives 



The SI derived "mechanical" 

(that is, not electromagnetic or thermal) 

units with kg, m and s 


Position 


m 


Angular position/ Angle 


unitless (radian) 


velocity 


m s _1 


Angular velocity 


s" 1 


acceleration 


m s~ 


Angular acceleration 


s" 2 


jerk 


m s" 3 


"Angular jerk" 


s" 3 


specific energy 


m 2 s" 2 


absorbed dose rate 


m 2 s" 3 


moment of inertia 


kg m 2 


momentum 


kg m s~ 


angular momentum 


kg m 2 s~ 


force 


kg m s~ 


torgue 


kg m 2 s~ 
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energy 


kg m 2 s~ 2 


power 


kg m 2 s~ 


pressure and energy density 


kg m~ s~ 


surface tension 


kgs" 2 


Spring constant 


kgs" 2 


irradiance and energy flux 


kgs" 3 


kinematic viscosity 


■> -l 
m 2 s 


dynamic viscosity 


kg m~ s 


Density(mass density) 


kg m" 3 


Densityfweight density) 


kg m" 2 s" 2 


Number density 


m" 3 


Action 


kg m 2 s~ 



The position of a point particle is defined with respect to an arbitrary fixed reference point, 
O, in space, usually accompanied by a coordinate system, with the reference point located 
at the origin of the coordinate system. It is defined as the vector r from O to the particle. In 
general, the point particle need not be stationary relative to O, so r is a function of t, the 
time elapsed since an arbitrary initial time. In pre-Einstein relativity (known as Galilean 
relativity), time is considered an absolute, i.e., the time interval between any given pair of 
events is the same for all observers. In addition to relying on absolute time, classical 



mechanics assumes Euclidean geometry for the structure of space. 



[l] 



Velocity and speed 

The velocity, or the rate of change of position with time, is defined as the derivative of the 
position with respect to time or 

&= — 

at ' 

In classical mechanics, velocities are directly additive and subtractive. For example, if one 
car traveling East at 60 km/h passes another car traveling East at 50 km/h, then from the 
perspective of the slower car, the faster car is traveling east at 60 - 50 = 10 km/h. 
Whereas, from the perspective of the faster car, the slower car is moving 1 km/h to the 
West. Velocities are directly additive as vector quantities; they must be dealt with using 
vector analysis. 

Mathematically, if the velocity of the first object in the previous discussion is denoted by 
the vector u = ud and the velocity of the second object by the vector v = ve where «is 
the speed of the first object, wis the speed of the second object, and ^ and e*are unit 
vectors in the directions of motion of each particle respectively, then the velocity of the first 
object as seen by the second object is: 

u' = U — V 
Similarly: 



v = v 



it 
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When both objects are moving in the same direction, this equation can be simplified to: 

u' = (u — v)d 
Or, by ignoring direction, the difference can be given in terms of speed only: 

u = u — v 

Acceleration 

The acceleration, or rate of change of velocity, is the derivative of the velocity with respect 
to time (the second derivative of the position with respect to time) or 

dv 

dt 
Acceleration can arise from a change with time of the magnitude of the velocity or of the 

direction of the velocity or both. If only the magnitude, v , of the velocity decreases, this is 

sometimes referred to as deceleration, but generally any change in the velocity with time, 

including deceleration, is simply referred to as acceleration. 

Frames of reference 

While the position and velocity and acceleration of a particle can be referred to any 
observer in any state of motion, classical mechanics assumes the existence of a special 
family of reference frames in terms of which the mechanical laws of nature take a 
comparatively simple form. These special reference frames are called inertial frames. They 
are characterized by the absence of acceleration of the observer and the requirement that 
all forces entering the observer's physical laws originate in identifiable sources (charges, 
gravitational bodies, and so forth). A non-inertial reference frame is one accelerating with 
respect to an inertial one, and in such a non-inertial frame a particle is subject to 
acceleration by fictitious forces that enter the equations of motion solely as a result of its 
accelerated motion, and do not originate in identifiable sources. These fictitious forces are 
in addition to the real forces recognized in an inertial frame. A key concept of inertial 
frames is the method for identifying them. (See inertial frame of reference for a discussion.) 
For practical purposes, reference frames that are unaccelerated with respect to the distant 
stars are regarded as good approximations to inertial frames. 

The following consequences can be derived about the perspective of an event in two inertial 
reference frames, 5'and S' , where S'is traveling at a relative velocity of cto S . 

• !?=£-£ (the velocity Joi a particle from the perspective of S' is slower by £than its 
velocity rfrom the perspective of S) 

• 5=3. (the acceleration of a particle is the same in any inertial reference frame) 

• F>=F (the force on a particle is the same in any inertial reference frame) 

• the speed of light is not a constant in classical mechanics, nor does the special position 
given to the speed of light in relativistic mechanics have a counterpart in classical 
mechanics. 

• the form of Maxwell's equations is not preserved across such inertial reference frames. 
However, in Einstein's theory of special relativity, the assumed constancy (invariance) of 
the vacuum speed of light alters the relationships between inertial reference frames so as 
to render Maxwell's equations invariant. 
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Forces; Newton's Second Law 

Newton was the first to mathematically express the relationship between force and 
momentum. Some physicists interpret Newton's second law of motion as a definition of 
force and mass, while others consider it to be a fundamental postulate, a law of nature. 
Either interpretation has the same mathematical consequences, historically known as 
"Newton's Second Law": 

-+ dp d(mv) 

' ~ dt ' dt ' 
The quantity mvis called the (canonical) momentum. The net force on a particle is thus 
equal to rate chanqe of momentum of the particle with time. Since the definition of 



dv 
leration is a = 

form 



acceleration is a = — , the second law can be written in the simplified and more familiar 

df 



F = ma- 
So lonq as the force actinq on a particle is known, Newton's second law is sufficient to 
describe the motion of a particle. Once independent relations for each force actinq on a 
particle are available, they can be substituted into Newton's second law to obtain an 
ordinary differential equation, which is called the equation of motion. 

As an example, assume that friction is the only force actinq on the particle, and that it may 
be modeled as a function of the velocity of the particle, for example: 

F R = -Xv 
with A a positive constant. Then the equation of motion is 

—Xv = ma = m — . 

dt 

This can be inteqrated to obtain 

v = v e-* t/m 
where Wois the initial velocity. This means that the velocity of this particle decays 
exponentially to zero as time proqresses. In this case, an equivalent viewpoint is that the 
kinetic enerqy of the particle is absorbed by friction (which converts it to heat enerqy in 
accordance with the conservation of enerqy), slowinq it down. This expression can be 
further inteqrated to obtain the position Fof the particle as a function of time. 
Important forces include the qravitational force and the Lorentz force for 
electromaqnetism. In addition, Newton's third law can sometimes be used to deduce the 
forces actinq on a particle: if it is known that particle A exerts a force J^on another particle 
B, it follows that B must exert an equal and opposite reaction force, —F, on A. The stronq 
form of Newton's third law requires that J^and —fact alonq the line connectinq A and B, 
while the weak form does not. Illustrations of the weak form of Newton's third law are often 
found for maqnetic forces. 
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Energy 

If a force pis applied to a particle that achieves a displacement Af , the work done by the 
force is defined as the scalar product of force and displacement vectors: (noting that the 
displacement vector is the change in position vector) 

W = FAr- 

If the mass of the particle is constant, and W" is the total work done on the particle, 
obtained by summing the work done by each applied force, from Newton's second law: 

W batal = AE k , 
where E is called the kinetic energy. For a point particle, it is mathematically defined as 
the amount of work done to accelerate the particle from zero velocity to the given velocity 

V: 

r 1 2 

E k = pnv . 
For extended objects composed of many particles, the kinetic energy of the composite body 
is the sum of the kinetic energies of the particles. 

A particular class of forces, known as conservative forces, can be expressed as the gradient 
of a scalar function, known as the potential energy and denoted E : 

F = -VE p . 
If all the forces acting on a particle are conservative, and E is the total potential energy 
(which is defined as a work of involved forces to rearrange mutual positions of bodies), 
obtained by summing the potential energies corresponding to each force 

F ■ Af = -VE p ■ As = -AE P => -AE p = AE k => A(E k + E p ) = 0. 
This result is known as conservation of energy and states that the total energy, 

is constant in time. It is often useful, because many commonly encountered forces are 
conservative. 

Beyond Newton's Laws 

Classical mechanics also includes descriptions of the complex motions of extended 
non-pointlike objects. Euler's laws provide extensions to Newton's laws in this area. The 
concepts of angular momentum rely on the same calculus used to describe one-dimensional 
motion. 

There are two important alternative formulations of classical mechanics: Lagrangian 
mechanics and Hamiltonian mechanics. These, and other modern formulations, usually 
bypass the concept of "force", instead referring to other physical quantities, such as energy, 
for describing mechanical systems. 

Classical transformations 

Consider two reference frames S and S' . For observers in each of the reference frames an 
event has space-time coordinates of (x,y,z,£) in frame S and (x' ,y' ,z' ,V ) in frame S' . 
Assuming time is measured the same in all reference frames, and if we require x = x' when 
t = 0, then the relation between the space-time coordinates of the same event observed 
from the reference frames S' and S, which are moving at a relative velocity of u in the x 
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direction is: 
x' = x - ut 

y = y 

z' = z 

V = t 
This set of formulas defines a group transformation known as the Galilean transformation 
(informally, the Galilean transform) . This group is a limiting case of the Poincare group 
used in special relativity. The limiting case applies when the velocity u is very small 
compared to c, the speed of light. 

For some problems, it is convenient to use rotating coordinates (reference frames). Thereby 
one can either keep a mapping to a convenient inertial frame, or introduce additionally a 
fictitious centrifugal force and Coriolis force. 

History 

Some Greek philosophers of antiquity, among them Aristotle, may have been the first to 
maintain the idea that "everything happens for a reason" and that theoretical principles can 
assist in the understanding of nature. While to a modern reader, many of these preserved 
ideas come forth as eminently reasonable, there is a conspicuous lack of both mathematical 
theory and controlled experiment, as we know it. These both turned out to be decisive 
factors in forming modern science, and they started out with classical mechanics. 

An early experimental scientific method was introduced into mechanics in the 11th century 
by al-Biruni, who along with al-Khazini in the 12th century, unified statics and dynamics 
into the science of mechanics, and combined the fields of hydrostatics with dynamics to 
create the field of hydrodynamics. Concepts related to Newton's laws of motion were also 
enunciated by several other Muslim physicists during the Middle Ages. Early versions of the 
law of inertia, known as Newton's first law of motion, and the concept relating to 
momentum, part of Newton's second law of motion, were described by Ibn al-Haytham 
(Alhacen) and Avicenna. The proportionality between force and acceleration, an 

important principle in classical mechanics, was first stated by Hibat Allah Abu'l-Barakat 

T71 

al-Baghdaadi, and theories on gravity were developed by Ja'far Muhammad ibn Musa ibn 
Shakir, [8] Ibn al-Haytham, [9] and al-Khazini. [10] It is known that Galileo Galilei's 
mathematical treatment of acceleration and his concept of impetus grew out of earlier 
medieval analyses of motion, especially those of Avicenna, Ibn Bajjah, and Jean 
Buridan. 

The first published causal explanation of the motions of planets was Johannes Kepler's 
Astronomia nova published in 1609. He concluded, based on Tycho Brahe's observations of 
the orbit of Mars, that the orbits were ellipses. This break with ancient thought was 
happening around the same time that Galilei was proposing abstract mathematical laws for 
the motion of objects. He may (or may not) have performed the famous experiment of 
dropping two cannon balls of different masses from the tower of Pisa, showing that they 
both hit the ground at the same time. The reality of this experiment is disputed, but, more 
importantly, he did carry out quantitative experiments by rolling balls on an inclined plane. 
His theory of accelerated motion derived from the results of such experiments, and forms a 
cornerstone of classical mechanics. 
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As foundation for his principles of natural philosophy, Newton proposed three laws of 
motion: the law of inertia, his second law of acceleration (mentioned above), and the law of 
action and reaction; and hence laid the foundations for classical mechanics. Both Newton's 
second and third laws were given proper scientific and mathematical treatment in Newton's 
Philosophiae Naturalis Principia Mathematica, which distinguishes them from earlier 
attempts at explaining similar phenomena, which were either incomplete, incorrect, or 
given little accurate mathematical expression. Newton also enunciated the principles of 
conservation of momentum and angular momentum. In Mechanics, Newton was also the 
first to provide the first correct scientific and mathematical formulation of gravity in 
Newton's law of universal gravitation. The combination of Newton's laws of motion and 
gravitation provide the fullest and most accurate description of classical mechanics. He 
demonstrated that these laws apply to everyday objects as well as to celestial objects. In 
particular, he obtained a theoretical explanation of Kepler's laws of motion of the planets. 

Newton previously invented the calculus, of mathematics, and used it to perform the 
mathematical calculations. For acceptability, his book, the Principia, was formulated 
entirely in terms of the long established geometric methods, which were soon to be eclipsed 
by his calculus. However it was Leibniz who developed the notation of the derivative and 
integral preferred today. 

Newton, and most of his contemporaries, with the notable exception of Huygens, worked on 
the assumption that classical mechanics would be able to explain all phenomena, including 
light, in the form of geometric optics. Even when discovering the so-called Newton's rings 
(a wave interference phenomenon) his explanation remained with his own corpuscular 
theory of light. 

After Newton, classical mechanics became a principal field of study in mathematics as well 
as physics. 

Some difficulties were discovered in the late 19th century that could only be resolved by 
more modern physics. Some of these difficulties related to compatibility with 
electromagnetic theory, and the famous Michelson-Morley experiment. The resolution of 
these problems led to the special theory of relativity, often included in the term classical 
mechanics. 

A second set of difficulties were related to thermodynamics. When combined with 
thermodynamics, classical mechanics leads to the Gibbs paradox of classical statistical 
mechanics, in which entropy is not a well-defined quantity. Black-body radiation was not 
explained without the introduction of quanta. As experiments reached the atomic level, 
classical mechanics failed to explain, even approximately, such basic things as the energy 
levels and sizes of atoms and the photo-electric effect. The effort at resolving these 
problems led to the development of quantum mechanics. 

Since the end of the 20th century, the place of classical mechanics in physics has been no 
longer that of an independent theory. Emphasis has shifted to understanding the 
fundamental forces of nature as in the Standard model and its more modern extensions into 

MO] 

a unified theory of everything. 1 J Classical mechanics is a theory for the study of the 
motion of non-quantum mechanical, low-energy particles in weak gravitational fields. 
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Far less than 3x10 8 m/s 



Comparable to 3x10 8 m/s 



Limits of validity 

Many branches of classical 
mechanics are simplifications 
or approximations of more 
accurate forms; two of the 
most accurate being general 
relativity and relativistic 
statistical mechanics. 

Geometric optics is an 
approximation to the quantum 
theory of light, and does not 
have a superior "classical" 
form. 

The Newtonian 
approximation to 
special relativity 

Newtonian, or non-relativistic 
classical momentum 

p = m u 
is the result of the first order Taylor approximation of the relativistic expression: 






Classical 
Mechanics 


Relativistic 
Mechanics 


Quantum 
Mechanics 


Quantum 
Field Theory 



Domain of validity for Classical Mechanics 



P = 



m v 



1 2 
1 V 



V 



/[ 



= m v\ 1 + --j + 



where v = \v\ 



when expanded about 

v 

- = 

c 

so it is only valid when the velocity is much less than the speed of light. Quantitatively 
speaking, the approximation is good so long as 

2 

<<1 



0' 



For example, the relativistic cyclotron frequency of a cyclotron, gyrotron, or high voltage 

magnetron is given by J ~ Jc Tle % ' wnere J c ^ s the classical frequency of an electron 

(or other charged particle) with kinetic energy Tand (rest) mass m o circling in a magnetic 
field. The (rest) mass of an electron is 511 keV. So the frequency correction is 1% for a 
magnetic vacuum tube with a 5.11 kV. direct current accelerating voltage. 
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The classical approximation to quantum mechanics 

The ray approximation of classical mechanics breaks down when the de Broglie wavelength 
is not much smaller than other dimensions of the system. For non-relativistic particles, this 
wavelength is 

x= h - 

P 
where h is Planck's constant and p is the momentum. 

Again, this happens with electrons before it happens with heavier particles. For example, 
the electrons used by Clinton Davisson and Lester Germer in 1927, accelerated by 54 volts, 
had a wave length of 0.167 nm, which was long enough to exhibit a single diffraction side 
lobe when reflecting from the face of a nickel crystal with atomic spacing of 0.215 nm. With 
a larger vacuum chamber, it would seem relatively easy to increase the angular resolution 
from around a radian to a milliradian and see quantum diffraction from the periodic 
patterns of integrated circuit computer memory. 

More practical examples of the failure of classical mechanics on an engineering scale are 
conduction by quantum tunneling in tunnel diodes and very narrow transistor gates in 
integrated circuits. 

Classical mechanics is the same extreme high frequency approximation as geometric optics. 
It is more often accurate because it describes particles and bodies with rest mass. These 
have more momentum and therefore shorter De Broglie wavelengths than massless 
particles, such as light, with the same kinetic energies. 



Branches 

Classical mechanics was traditionally divided into three 
main branches: 

• Statics, the study of equilibrium and its relation to 
forces 

• Dynamics, the study of motion and its relation to 
forces 

• Kinematics, dealing with the implications of observed 
motions without regard for circumstances causing 
them 
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Branches of mechanics 



Another division is based on the choice of mathematical 
formalism: 

• Newtonian mechanics 

• Lagrangian mechanics 

• Hamiltonian mechanics 

Alternatively, a division can be made by region of application: 

• Celestial mechanics, relating to stars, planets and other celestial bodies 

• Continuum mechanics, for materials which are modelled as a continuum, e.g., solids and 
fluids (i.e., liquids and gases). 

• Relativistic mechanics (i.e. including the special and general theories of relativity), for 
bodies whose speed is close to the speed of light. 



Classical mechanics 104 

• Statistical mechanics, which provides a framework for relating the microscopic 
properties of individual atoms and molecules to the macroscopic or bulk thermodynamic 
properties of materials. 

See also 

• History of classical mechanics 

• Dynamical systems 

• List of equations in classical mechanics 

• List of publications in classical mechanics 

• Molecular dynamics 

• Newton's laws of motion 

• Special theory of relativity 
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[2] Mariam Rozhanskaya and I. S. Levinova (1996), "Statics", in Roshdi Rashed, ed., Encyclopedia of the History of 

Arabic Science, Vol. 2, p. 614-642 [642], Routledge, London and New York 
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[8] Robert Briffault (1938). The Making of Humanity, p. 191. 
[9] Nader El-Bizri (2006), "Ibn al-Haytham or Alhazen", in Josef W. Meri (2006), Medieval Islamic Civilization: An 

Encyclopaedia, Vol. II, p. 343-345, Routledge, New York, London. 
[10] Mariam Rozhanskaya and I. S. Levinova (1996), "Statics", in Roshdi Rashed, ed., Encyclopaedia of the History 

of Arabic Science, Vol. 2, p. 622. London and New York: Routledge. 

[II] Galileo Galilei, Two New Sciences, trans. Stillman Drake, (Madison: Univ. of Wisconsin Pr., 1974), pp 217, 
225, 296-7. 

[12] Ernest A. Moody (1951). "Galileo and Avempace: The Dynamics of the Leaning Tower Experiment (I)", Journal 
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Newton's laws of motion 



Newton's laws of motion are three physical laws 
that form the basis for classical mechanics. They 
are: 

1 . A body at rest stays at rest, and a body in motion 
stays in motion, unless it is acted on by an external 
force. 

2. Force equals mass times acceleration (F = ma) 
(or alternately, force equals the time rate of chanqe 
of momentum). 

3. To every action there is an equal and opposite 
reaction. 

They describe the relationship between the forces 
actinq on a body to the motion of the body. They 
were first compiled by Sir Isaac Newton in his work 
Philosophies Naturalis Principia Mathematica, first 
published on July 5, 1687. Newton used them to 
explain and investiqate the motion of many physical 
objects and systems. For example, in the third 
volume of the text, Newton showed that these laws 
of motion, combined with his law of universal 
qravitation, explained Kepler's laws of planetary 
motion. 
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Newton's First and Second laws, in Latin, 

from the original 1687 edition of the 

Principia Mathematica. 



The three laws 



First law 

There exists a set of inertial reference frames relative to which all particles with no net 
force actinq on them will move without chanqe in their velocity. This law is often 
simplified as "A body persists its state of rest or of uniform motion unless acted upon 
by an external unbalanced force." Newton's first law is often referred to as the law of 
inertia. 

Second law 

Observed from an inertial reference frame, the net force on a particle is proportional 
to the time rate of chanqe of its linear momentum: F = d(mv)/dt. This law is often 
stated as, "Force equals mass times acceleration (F = ma)": the net force on an object 
is equal to the mass of the object multiplied by its acceleration. 

Third law 

Whenever a particle A exerts a force on another particle B, B simultaneously exerts a 
force on A with the same maqnitude in the opposite direction. The stronq form of the 
law further postulates that these two forces act alonq the same line. This law is often 
simplified into the sentence, "To every action there is an equal and opposite reaction." 
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In the given interpretation mass, acceleration, momentum, and (most importantly) force are 
assumed to be externally defined quantities. This is the most common, but not the only 
interpretation: one can consider the laws to be a definition of these quantities. Notice that 
the second law only holds when the observation is made from an inertial reference frame, 
and since an inertial reference frame is defined by the first law, asking a proof of the first 
law from the second law is a logical fallacy. At speeds approaching the speed of light the 
effects of special relativity must be taken into account. 

Newton's first law: law of inertia 

Lex I: Corpus omne perseverare in statu suo quiescendi vel movendi uniformiter 
in directum, nisi quatenus a viribus impressis cogitur statum ilium mutare. Every 
body persists in its state of being at rest or of moving uniformly straight 
forward, except insofar as it is compelled to change its state by force 
impressed. 

Newton's first law is also called the law of inertia. In a simplified form, it states that if the 
vector sum of all forces (also known as the net force) acting on an object is zero, then the 
state of motion of the object does not change. In particular: Newton's first law: An object at 
rest remains at rest and an object in motion will remain in motion unless acted on by an 
unbalanced force. 

• An object that is not moving will not move until a net force acts upon it. 

• An object that is moving will not change its velocity (accelerate) until a net force acts 
upon it. 

The first point needs no comment, but the second seems to violate everyday experience. A 
hockey puck sliding along a table doesn't move forever; rather, it slows and eventually 
comes to a stop. According to Newton's laws, though, the hockey puck does not stop of its 
own accord, but because of a force applied in the opposite direction to the direction of 
motion. That force is easily identified as a frictional force between the table and the puck. 
In the absence of such a force, as approximated by an air hockey table or ice rink, the 
puck's motion would not slow. 

There are no perfect demonstrations of the law, as friction usually causes a force to act on a 
moving body, and even in outer space gravitational forces act and cannot be shielded 
against, but the law serves to emphasize the elementary causes of changes in an object's 
state of motion. 

The above treatment of Newton's first law is an over-simplification, though. A more 
sophisticated approach to the law of inertia is given by: 

There is a class of frames of reference (called inertial frames) relative to 
which the motion of a particle not subject to forces is a straight line. 

Newton placed the law of inertia first to establish frames of reference for which the other 
laws are applicable (see Gailili & Tseitlin, or Woodhouse ). Such frames are called 
inertial frames. 

To understand why the laws are restricted to inertial frames, consider a ball at rest within 
an accelerating body: an airplane on a runway will suffice for this example. From the 
perspective of anyone within the airplane (that is, from the airplane's frame of reference 
when put in technical terms) the ball will appear to move backwards as the plane 
accelerates forwards (the same feeling as being pushed back into your seat as the plane 
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accelerates). This motion appears to contradict Newton's second law as, from the point of 
view of the passengers, there appears to be no force acting on the ball that would cause it 
to move. The reason why there is in fact no contradiction to the second law is because 
Newton's second law (without modification) is not applicable in this situation: Newton's 
first law does not apply because the stationary ball does not remain stationary. Thus, it is 
important to establish whether the various laws are applicable or not, inasmuch as they are 
not applicable in all situations. ^ 

History of the Law of Inertia 

Newton's first law is a restatement of what Galileo had already described and Newton gave 
credit to Galileo. It differs from Aristotle's view that all objects have a natural place in the 
universe. Aristotle believed that heavy objects like rocks wanted to be at rest on the Earth 
and that light objects like smoke wanted to be at rest in the sky and the stars wanted to 
remain in the heavens. However, a key difference between Galileo's idea and Aristotle's is 
that Galileo realized that force acting on a body determines acceleration, not velocity. This 
insight leads to Newton's First Law— no force means no acceleration, and hence the body 
will maintain its velocity. 

The law of inertia apparently occurred to several different natural philosophers and 
scientists independently. The inertia of motion was described in the 3rd century BC by the 
Chinese philosopher Mo Tzu, and in the 11th century by the Muslim scientists, Alhazem ] 
and Avicenna. The 17th century philosopher Rene Descartes also formulated the law, 
although he did not perform any experiments to confirm it. 

Newton's second law 

Lex II: Mutationem motus proportionalem esse vz motrici impressae, et fieri secundum 
lineam rectam qua vis ilia imprimitur. 

The change of momentum of a body is proportional to the impulse impressed 
on the body, and happens along the straight line on which that impulse is 
impressed. 

In Motte's 1729 translation (from Newton's Latin), the second law of motion reads: 

LAW II: The alteration of motion is ever proportional to the motive force 
impressed; and is made in the direction of the right line in which that force is 
impressed. — If a force generates a motion, a double force will generate double 
the motion, a triple force triple the motion, whether that force be impressed 
altogether and at once, or gradually and successively. And this motion (being 
always directed the same way with the generating force), if the body moved 
before, is added to or subtracted from the former motion, according as they 
directly conspire with or are directly contrary to each other; or obliquely joined, 
when they are oblique, so as to produce a new motion compounded from the 
determination of both. 

Using modern symbolic notation, Newton's second law can be written as a vector 
differential equation: 

d(mv) dv 

where F is the force vector, m is the mass of the body, v is the velocity vector and t is time. 
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The product of the mass and velocity is momentum (which Newton himself called "quantity 
of motion"). Therefore, this equation expresses the physical relationship between force and 
momentum for a body with constant mass. Because the law describes the motion of bodies 
of constant mass only , the mass can be moved outside the differential operator. 

The equation implies that, under zero net force, the momentum of a body is also constant. 
However, any mass that is qained or lost by the body will cause a chanqe in momentum that 
is not the result of an external force. This equation does not hold in such cases. See open 
systems. 

It should be noted that, as is consistent with the law of inertia, the time derivative of the 
momentum is non-zero when the momentum chanqes direction, even if there is no chanqe 
in its magnitude. See time derivative. 

By substitution usinq the definition of acceleration, this differential equation can be 
rewritten in a more familiar form 

F = 772a 
where 

dv 

a = — — . 
dt 

A verbal equivalent of this is "the acceleration of an object is proportional to the force 
applied, and inversely proportional to the mass of the object". In qeneral, at slow speeds 
(slow relative to the speed of liqht), the relationship between momentum and velocity is 
approximately linear. Nearly all speeds within the human experience fall within this 
cateqory. At hiqher speeds, however, this approximation becomes increasinqly inaccurate 
and the theory of special relativity must be applied. 

Impulse 

The term impulse is closely related to the second law, and historically speakinq is closer to 
the oriqinal meaninq of the law. The meaninq of an impulse is as follows: 

An impulse occurs when a force F acts over an interval of time At and is qiven by 

/ Fdt. 

The words motive force were used by Newton to describe "impulse" and motion to describe 
momentum; consequently, a historically closer readinq of the second law describes the 
relation between impulse and chanqe of momentum. That is, a mathematical renderinq of 
the oriqinal wordinq resembles a finite difference version of the second law, such as 

I = Ap = mAv 

where I is the impulse, Ap is the chanqe in momentum, m is the mass, and Av is the chanqe 
in velocity. 

rioi 

The analysis of collisions and impacts uses the impulse concept. 
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Relativity 

Main article: Special Relativity 

Open systems 

So-called variable mass systems that are not closed systems, like a rocket burning fuel and 

ejecting spent gases, can not be directly treated by making mass a function of time in the 

ri2i ri3i 
second law. The reasoning, given in An Introduction to Mechanics by Kleppner and 

Kolenkow and other modern texts, is that Newton's second law applies fundamentally to 

particles. In classical mechanics, particles by definition have constant mass. In case of 

well-defined systems of particles, Newton's law can be extended by summing over all the 

particles in the system. In this case, we have to refer all vectors to the center of mass. 

Applying the second law to extended objects implicitly assumes the object to be a 

well-defined collection of particles. However, 'variable mass' systems like a rocket or a 

leaking bucket do not consist of a set number of particles. They are not well-defined 

systems. Therefore Newton's second law can not be applied to them directly. 

The general equation of motion for a body whose mass m varies with time by either ejecting 
or accreting mass is obtained by rearranging the second law and adding a term to account 
for the momentum carried by mass entering or leaving the system, 



F nc t + U-r^ = m~ 



dm dv 

— — =m— 
dt dt 

where u is the relative velocity of the escaping or incoming mass with respect to the center 
of mass of the body. Under some conventions, the quantity u*dm/d£ on the left-hand side is 
defined as a force (the force exerted on the body by the changing mass, such as rocket 
exhaust) and is included in the quantity F . Then, by substituting the definition of 
acceleration, the equation becomes, once again, 

F not = ma. 
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Newton's third law. The skaters' forces on each other 
are equal in magnitude, but act in opposite directions. 



Newton's third law: law of reciprocal actions 

Lex III: Actioni contrariam semper et aequalem esse reactionem: sive corporum 
duorum actiones in se mutuo semper esse asquales et in partes contrarias dirigi. 

For a force there is always an equal and opposite reaction: or the forces of two 
bodies on each other are always equal and are directed in opposite directions. 

A more direct translation is: 

LAW III: To every action there is 

always opposed an equal 

reaction: or the mutual actions of 

two bodies upon each other are 

always equal, and directed to 

contrary parts. — Whatever 

draws or presses another is as 

much drawn or pressed by that 

other. If you press a stone with 

your finqer, the finqer is also 

pressed by the stone. If a horse 

draws a stone tied to a rope, the 

horse (if I may so say) will be 

equally drawn back towards the 

stone: for the distended rope, by 

the same endeavour to relax or unbend itself, will draw the horse as much 

towards the stone, as it does the stone towards the horse, and will obstruct the 

proqress of the one as much as it advances that of the other. If a body impinqes 

upon another, and by its force chanqes the motion of the other, that body also 

(because of the equality of the mutual pressure) will underqo an equal chanqe, in 

its own motion, toward the contrary part. The chanqes made by these actions are 

equal, not in the velocities but in the motions of the bodies; that is to say, if the 

bodies are not hindered by any other impediments. For, as the motions are 

equally chanqed, the chanqes of the velocities made toward contrary parts are 

reciprocally proportional to the bodies. This law takes place also in attractions, as 

will be proved in the next scholium. 

In the above, as usual, motion is Newton's name for momentum, hence his careful 
distinction between motion and velocity. 

The Third Law means that all forces are interactions, and thus that there is no such thinq as 
a unidirectional force. If body A exerts a force on body B, simultaneously, body B exerts a 
force of the same maqnitude body A, both forces actinq alonq the same line. As shown in 
the diaqram opposite, the skaters' forces on each other are equal in maqnitude, but act in 
opposite directions. Althouqh the forces are equal, the accelerations are not: the less 
massive skater will have a qreater acceleration due to Newton's second law. It is important 
to note that the action and reaction act on different objects and do not cancel each other 
out. The two forces in Newton's third law are of the same type (e.g., if the road exerts a 
forward frictional force on an acceleratinq car's tires, then it is also a frictional force that 
Newton's third law predicts for the tires pushinq backward on the road). 
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Newton used the third law to derive the law of conservation of momentum; [ ] however 
from a deeper perspective, conservation of momentum is the more fundamental idea 
(derived via Noether's theorem from Galilean invariance), and holds in cases where 
Newton's third law appears to fail, for instance when force fields as well as particles carry 
momentum, and in quantum mechanics. 

Importance and range of validity 

Newton's laws were verified by experiment and observation for over 200 years, and they 
are excellent approximations at the scales and speeds of everyday life. Newton's laws of 
motion, together with his law of universal gravitation and the mathematical techniques of 
calculus, provided for the first time a unified quantitative explanation for a wide range of 
physical phenomena. 

These three laws hold to a good approximation for macroscopic objects under everyday 
conditions. However, Newton's laws (combined with Universal Gravitation and Classical 
Electrodynamics) are inappropriate for use in certain circumstances, most notably at very 
small scales, very high speeds (in special relativity, the Lorentz factor must be included in 
the expression for momentum along with rest mass and velocity) or very strong 
gravitational fields. Therefore, the laws cannot be used to explain phenomena such as 
conduction of electricity in a semiconductor, optical properties of substances, errors in 
non-relativistically corrected GPS systems and superconductivity. Explanation of these 
phenomena requires more sophisticated physical theory, including General Relativity and 
Relativistic Quantum Mechanics. 

In quantum mechanics concepts such as force, momentum, and position are defined by 
linear operators that operate on the quantum state; at speeds that are much lower than the 
speed of light, Newton's laws are just as exact for these operators as they are for classical 
objects. At speeds comparable to the speed of light, the second law holds in the original 
form F = dp/dt, which says that the force is the derivative of the momentum of the object 
with respect to time, but some of the newer versions of the second law (such as the 
constant mass approximation above) do not hold at relativistic velocities. 

Relationship to the conservation laws 

In modern physics, the laws of conservation of momentum, energy, and angular momentum 
are of more general validity than Newton's laws, since they apply to both light and matter, 
and to both classical and non-classical physics. 

This can be stated simply, "Momentum, energy and angular momentum cannot be created 
or destroyed." 

Because force is the time derivative of momentum, the concept of force is redundant and 
subordinate to the conservation of momentum, and is not used in fundamental theories (e.g. 
quantum mechanics, quantum electrodynamics, general relativity, etc.). The standard 
model explains in detail how the three fundamental forces known as gauge forces originate 
out of exchange by virtual particles. Other forces such as gravity and fermionic degeneracy 
pressure also arise from the momentum conservation. Indeed, the conservation of 
4-momentum in inertial motion via curved space-time results in what we call gravitational 
force in general relativity theory. Application of space derivative (which is a momentum 
operator in quantum mechanics) to overlaping wave functions of pair of fermions (particles 
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with semi-integer spin) results in shifts of maxima of compound wavefunction away from 
each other, which is observable as "repulsion" of fermions. 

Newton stated the third law within a world-view that assumed instantaneous action at a 
distance between material particles. However, he was prepared for philosophical criticism 
of this action at a distance, and it was in this context that he stated the famous phrase "I 
feign no hypotheses". In modern physics, action at a distance has been completely 
eliminated, except for subtle effects involving quantum entanglement. However in modern 
engineering in all practical applications involving the motion of vehicles and satellites, the 
concept of action at a distance is used extensively. 

Conservation of energy was discovered nearly two centuries after Newton's lifetime, the 
long delay occurring because of the difficulty in understanding the role of microscopic and 
invisible forms of energy such as heat and infra-red light. 

See also 

• Scientific laws named after people 

• Mercury, orbit of 

• Galilean invariance 

• Modified Newtonian dynamics 

• Lagrangian mechanics 

• Hamiltonian mechanics 

• Principle of least action 

• Euler's laws 
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Analytical dynamics 



In classical mechanics, analytical dynamics, or more briefly dynamics, is concerned 
about the relationship between motion of bodies and its causes, namely the forces acting on 
the bodies and the properties of the bodies (particularly mass and moment of inertia). The 
foundation of modern day dynamics is Newtonian mechanics and its reformulation as 
Lagrangian mechanics and Hamiltonian mechanics. The field has a long and important 

history, as remarked by Hamilton: 

The theoretical development of the laws of motion of bodies is a problem of such 
interest and importance that it has engaged the attention of all the eminent 
mathematicians since the invention of the dynamics as a mathematical science by 
Galileo, and especially since the wonderful extension which was given to that science 
by Newton 

- William Rowan Hamilton, 1834 (Transcribed in Classical Mechanics byJ.R. Taylor, p. 

237 [3] ) 

Some authors (for example, Taylor (2005) ^ and Greenwood (1997) [ ] ) include special 
relativity within classical dynamics. 

Relationship to static s, kinetics, and kinematics 

Historically, there were three branches of classical mechanics: "statics" (the study of 
equilibrium and its relation to forces); "kinetics" (the study of motion and its relation to 
forces) and "kinematics" (dealing with the implications of observed motions without 
regard for circumstances causing them). These three subjects have been connected to 
dynamics in several ways. One approach combined statics and kinetics under the name 
dynamics, which became the branch dealing with determination of the motion of bodies 

T71 

resulting from the action of specified forces ; another approach separated statics, and 
combined kinetics and kinematics under the rubric dynamics. This approach is 

common in engineering books on mechanics, and is still in widespread use among 
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mechanicians. 

Fundamental importance in engineering, diminishing emphasis in 
physics 

Today, dynamics and kinematics continue to be considered the two pillars of classical 
mechanics. Dynamics is still included in mechanical, aerospace, and other engineering 
curriculums because of its importance in machine design, the design of land, sea, air, and 
space vehicles and other applications. However, few modern physicists concern themselves 
with an independent treatment of "dynamics" or "kinematics", nevermind "statics" or 
"kinetics". Instead, the entire undifferentiated subject is referred to as classical mechanics. 
In fact, many undergraduate and graduate text books since mid-2 Oth century on "classical 

mechanics" lack chapters titled "dynamics" or "kinematics" . [3] [10] [11] [12] [13] [14] [15] [16] 

ri7i 

In these books, although the word "dynamics" is used when acceleration is ascribed to a 

force, the word "kinetics" is never mentioned. However, clear exceptions exist. Prominent 

MO] 

examples include The Feynman Lectures on Physics. 

Fundamental Principles 

• Newton's laws of motion 

• Inertia 

• Acceleration 

• Momentum 

• Reaction 

• Newton's law of universal gravitation 

• Special theory of relativity 

Axioms and mathematical treatments 

• Variational principles and Lagrange's equations 

• Hamilton's equations 

• Canonical transformations 

• Hamilton-Jacobi Theory 

Related engineering branches 

• Particle dynamics 

• Rigid body dynamics 

• Soft body dynamics 

• Fluid dynamics 

• Hydrodynamics 

• Gas dynamics 

• Aerodynamics 
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Related subjects 

• Statics 
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Molecular dynamics 



Molecular dynamics (MD) is a form of computer simulation in which atoms and molecules 
are allowed to interact for a period of time by approximations of known physics, giving a 
view of the motion of the atoms. Because molecular systems generally consist of a vast 
number of particles, it is impossible to find the properties of such complex systems 
analytically. When the number of bodies are more than two no analytical solutions can be 
found and result in chaotic motion (see n-body problem). MD simulation circumvents this 
problem by using numerical methods. It represents an interface between laboratory 
experiments and theory, and can be understood as a "virtual experiment". MD probes the 
relationship between molecular structure, movement and function. Molecular dynamics is a 
multidisciplinary method. Its laws and theories stem from mathematics, physics, and 

chemistry, and it employs algorithms from computer science and information theory. It was 

rn r2i 

originally conceived within theoretical physics in the late 1950s and early 1960s , but 

is applied today mostly in materials science and modeling of biomolecules. 

Before it became possible to simulate molecular dynamics with computers, some undertook 
the hard work of trying it with physical models such as macroscopic spheres. The idea was 
to arrange them to replicate the properties of a liquid. J.D. Bernal said, in 1962: "... I took a 
number of rubber balls and stuck them together with rods of a selection of different lengths 
ranging from 2.75 to 4 inches. I tried to do this in the first place as casually as possible, 
working in my own office, being interrupted every five minutes or so and not remembering 

T31 

what I had done before the interruption." Fortunately, now computers keep track of 
bonds during a simulation. 

Molecular dynamics is a specialized discipline of molecular modeling and computer 
simulation based on statistical mechanics; the main justification of the MD method is that 
statistical ensemble averages are equal to time averages of the system, known as the 
ergodic hypothesis. MD has also been termed "statistical mechanics by numbers" and 
"Laplace's vision of Newtonian mechanics" of predicting the future by animating nature's 
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forces' ] [ ] and allowing insight into molecular motion on an atomic scale. However, long 
MD simulations are mathematically ill-conditioned, generating cumulative errors in 
numerical integration that can be minimized with proper selection of algorithms and 
parameters, but not eliminated entirely. Furthermore, current potential functions are, in 
many cases, not sufficiently accurate to reproduce the dynamics of molecular systems, so 
the much more computationally demanding Ab Initio Molecular Dynamics method must be 
used. Nevertheless, molecular dynamics techniques allow detailed time and space 
resolution into representative behavior in phase space. 



Give atoms initial positions f'- - 1 , choose short At 
1 



Get forces F - - V V(r®) and a - F/m 



Move atoms: r«* 1 > = r"> +v« At + V? a At 2 + 



Move time forward: t = t + At 



I 



Repeat as long as you need 



Highly simplified description of the molecular dynamics simulation 

algorithm. The simulation proceeds iteratively by alternatively 

calculating forces and solving the equations of motion based on the 

accelerations obtained from the new forces. In practise, almost all 

MD codes use much more complicated versions of the algorithm, 

including two steps (predictor and corrector) in solving the equations 

of motion and many additional steps for e.g. temperature and 

pressure control, analysis and output. 



Areas of Application 

There is a significant difference 
between the focus and methods 
used by chemists and 
physicists, and this is reflected 
in differences in the jargon 
used by the different fields. In 
chemistry and biophysics, the 
interaction between the 
particles is either described by 
a "force field" (classical MD), 
a quantum chemical model, or 
a mix between the two. These 
terms are not used in physics, 
where the interactions are 
usually described by the name 
of the theory or approximation 
being used and called the 
potential energy, or just "potential". 

Beginning in theoretical physics, the method of MD gained popularity in materials science 
and since the 1970s also in biochemistry and biophysics. In chemistry, MD serves as an 
important tool in protein structure determination and refinement using experimental tools 
such as X-ray crystallography and NMR. It has also been applied with limited success as a 
method of refining protein structure predictions. In physics, MD is used to examine the 
dynamics of atomic-level phenomena that cannot be observed directly, such as thin film 
growth and ion-subplantation. It is also used to examine the physical properties of 
nanotechnological devices that have not or cannot yet be created. 

In applied mathematics and theoretical physics, molecular dynamics is a part of the 
research realm of dynamical systems, ergodic theory and statistical mechanics in general. 
The concepts of energy conservation and molecular entropy come from thermodynamics. 
Some techniques to calculate conformational entropy such as principal components analysis 
come from information theory. Mathematical techniques such as the transfer operator 
become applicable when MD is seen as a Markov chain. Also, there is a large community of 
mathematicians working on volume preserving, symplectic integrators for more 
computationally efficient MD simulations. 
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MD can also be seen as a special case of the discrete element method (DEM) in which the 
particles have spherical shape (e.g. with the size of their van der Waals radii.) Some 
authors in the DEM community employ the term MD rather loosely, even when their 
simulations do not model actual molecules. 

Design Constraints 

Design of a molecular dynamics simulation should account for the available computational 
power. Simulation size (n=number of particles), timestep and total time duration must be 
selected so that the calculation can finish within a reasonable time period. However, the 
simulations should be long enough to be relevant to the time scales of the natural processes 
being studied. To make statistically valid conclusions from the simulations, the time span 
simulated should match the kinetics of the natural process. Otherwise, it is analogous to 
making conclusions about how a human walks from less than one footstep. Most scientific 
publications about the dynamics of proteins and DNA use data from simulations spanning 
nanoseconds (1E-9 s) to microseconds (1E-6 s). To obtain these simulations, several 
CPU-days to CPU-years are needed. Parallel algorithms allow the load to be distributed 
among CPUs; an example is the spatial decomposition in LAMMPS. 

During a classical MD simulation, the most CPU intensive task is the evaluation of the 
potential (force field) as a function of the particles' internal coordinates. Within that energy 
evaluation, the most expensive one is the non-bonded or non-covalent part. In Big O 
notation, common molecular dynamics simulations scale by 0{n )if all pair-wise 
electrostatic and van der Waals interactions must be accounted for explicitly. This 
computational cost can be reduced by employing electrostatics methods such as Particle 
Mesh Ewald ( 0(nlog(n))) or good spherical cutoff techniques ( 0(n)). 

Another factor that impacts total CPU time required by a simulation is the size of the 
integration timestep. This is the time length between evaluations of the potential. The 
timestep must be chosen small enough to avoid discretization errors (i.e. smaller than the 
fastest vibrational frequency in the system). Typical timesteps for classical MD are in the 
order of 1 femtosecond (1E-15 s). This value may be extended by using algorithms such as 
SHAKE, which fix the vibrations of the fastest atoms (e.g. hydrogens) into place. Multiple 
time scale methods have also been developed, which allow for extended times between 
updates of slower long-range forces. 

For simulating molecules in a solvent, a choice should be made between explicit solvent and 
implicit solvent. Explicit solvent particles (such as the TIP3P and SPC/E water models) must 
be calculated expensively by the force field, while implicit solvents use a mean-field 
approach. Using an explicit solvent is computationally expensive, requiring inclusion of 
about ten times more particles in the simulation. But the granularity and viscosity of 
explicit solvent is essential to reproduce certain properties of the solute molecules. This is 
especially important to reproduce kinetics. 

In all kinds of molecular dynamics simulations, the simulation box size must be large 
enough to avoid boundary condition artifacts. Boundary conditions are often treated by 
choosing fixed values at the edges, or by employing periodic boundary conditions in which 
one side of the simulation loops back to the opposite side, mimicking a bulk phase. 
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Microcanonical ensemble (NVE) 

In the microcanonical, or NVE ensemble, the system is isolated from changes in moles 
(N), volume (V) and energy (E). It corresponds to an adiabatic process with no heat 
exchange. A microcanonical molecular dynamics trajectory may be seen as an exchange of 
potential and kinetic energy, with total energy being conserved. For a system of N particles 
with coordinates A' and velocities V, the following pair of first order differential equations 
may be written in Newton's notation as 

F{X) = -W(X) = MV(t) 

V(t) = X(t). 
The potential energy function &{X) of the system is a function of the particle coordinates 
X . It is referred to simply as the "potential" in Physics, or the "force field" in Chemistry. 
The first equation comes from Newton's laws; the force i^acting on each particle in the 
system can be calculated as the negative gradient of k r (A") . 

For every timestep, each particle's position A' and velocity I 'may be integrated with a 
symplectic method such as Verlet. The time evolution of A' and Vis called a trajectory. 
Given the initial positions (e.g. from theoretical knowledge) and velocities (e.g. randomized 
Gaussian), we can calculate all future (or past) positions and velocities. 

One frequent source of confusion is the meaning of temperature in MD. Commonly we have 
experience with macroscopic temperatures, which involve a huge number of particles. But 
temperature is a statistical quantity. If there is a large enough number of atoms, statistical 
temperature can be estimated from the instantaneous temperature, which is found by 
equating the kinetic energy of the system to nk T/2 where n is the number of degrees of 
freedom of the system. 

A temperature-related phenomenon arises due to the small number of atoms that are used 
in MD simulations. For example, consider simulating the growth of a copper film starting 
with a substrate containing 500 atoms and a deposition energy of 100 eV. In the real world, 
the 100 eV from the deposited atom would rapidly be transported through and shared 
among a large number of atoms ( 10 1Cl or more) with no big change in temperature. When 
there are only 500 atoms, however, the substrate is almost immediately vaporized by the 
deposition. Something similar happens in biophysical simulations. The temperature of the 
system in NVE is naturally raised when macromolecules such as proteins undergo 
exothermic conformational changes and binding. 

Canonical ensemble (NVT) 

In the canonical ensemble, moles (N), volume (V) and temperature (T) are conserved. It is 
also sometimes called constant temperature molecular dynamics (CTMD). In NVT, the 
energy of endothermic and exothermic processes is exchanged with a thermostat. 

A variety of thermostat methods are available to add and remove energy from the 
boundaries of an MD system in a realistic way, approximating the canonical ensemble. 
Popular techniques to control temperature include the Nose-Hoover thermostat, the 
Berendsen thermostat, and Langevin dynamics. Note that the Berendsen thermostat might 
introduce the flying ice cube effect, which leads to unphysical translations and rotations of 
the simulated system. 
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Isothermal-Isobaric (NPT) ensemble 

In the isothermal-isobaric ensemble, moles (N), pressure (P) and temperature (T) are 
conserved. In addition to a thermostat, a barostat is needed. It corresponds most closely to 
laboratory conditions with a flask open to ambient temperature and pressure. 

In the simulation of biological membranes, isotropic pressure control is not appropriate. 
For lipid bilayers, pressure control occurs under constant membrane area (NPAT) or 
constant surface tension "gamma" (NPyT). 

Generalized ensembles 

The replica exchange method is a generalized ensemble. It was originally created to deal 
with the slow dynamics of disordered spin systems. It is also called parallel tempering. The 
replica exchange MD (REMD) formulation tries to overcome the multiple-minima 

problem by exchanging the temperature of non-interacting replicas of the system running 
at several temperatures. 

Potentials in MD simulations 

A molecular dynamics simulation requires the definition of a potential function, or a 
description of the terms by which the particles in the simulation will interact. In chemistry 
and biology this is usually referred to as a force field. Potentials may be defined at many 
levels of physical accuracy; those most commonly used in chemistry are based on molecular 
mechanics and embody a classical treatment of particle-particle interactions that can 
reproduce structural and conformational changes but usually cannot reproduce chemical 
reactions. 

The reduction from a fully quantum description to a classical potential entails two main 
approximations. The first one is the Born-Oppenheimer approximation, which states that 
the dynamics of electrons is so fast that they can be considered to react instantaneously to 
the motion of their nuclei. As a consequence, they may be treated separately. The second 
one treats the nuclei, which are much heavier than electrons, as point particles that follow 
classical Newtonian dynamics. In classical molecular dynamics the effect of the electrons is 
approximated as a single potential energy surface, usually representing the ground state. 

When finer levels of detail are required, potentials based on quantum mechanics are used; 
some techniques attempt to create hybrid classical/quantum potentials where the bulk of 
the system is treated classically but a small region is treated as a quantum system, usually 
undergoing a chemical transformation. 

Empirical potentials 

Empirical potentials used in chemistry are frequently called force fields, while those used in 
materials physics are called just empirical or analytical potentials. 

Most force fields in chemistry are empirical and consist of a summation of bonded forces 
associated with chemical bonds, bond angles, and bond dihedrals, and non-bonded forces 
associated with van der Waals forces and electrostatic charge. Empirical potentials 
represent quantum-mechanical effects in a limited way through ad-hoc functional 
approximations. These potentials contain free parameters such as atomic charge, van der 
Waals parameters reflecting estimates of atomic radius, and equilibrium bond length, 
angle, and dihedral; these are obtained by fitting against detailed electronic calculations 
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(quantum chemical simulations) or experimental physical properties such as elastic 
constants, lattice parameters and spectroscopic measurements. 

Because of the non-local nature of non-bonded interactions, they involve at least weak 
interactions between all particles in the system. Its calculation is normally the bottleneck in 
the speed of MD simulations. To lower the computational cost, force fields employ 
numerical approximations such as shifted cutoff radii, reaction field algorithms, particle 
mesh Ewald summation, or the newer Particle-Particle Particle Mesh (P3M). 

Chemistry force fields commonly employ preset bonding arrangements (an exception being 
ab-initio dynamics), and thus are unable to model the process of chemical bond breaking 
and reactions explicitly. On the other hand, many of the potentials used in physics, such as 
those based on the bond order formalism can describe several different coordinations of a 
system and bond breaking. Examples of such potentials include the Brenner potential for 



hydrocarbons and its further developments for the C-Si-H and C-O-H systems. The ReaxFF 
potential can be co 
chemistry force fields. 



rm 
potential can be considered a fully reactive hybrid between bond order potentials and 



Pair potentials vs. many-body potentials 

The potential functions representing the non-bonded energy are formulated as a sum over 
interactions between the particles of the system. The simplest choice, employed in many 
popular force fields, is the "pair potential", in which the total potential energy can be 
calculated from the sum of energy contributions between pairs of atoms. An example of 
such a pair potential is the non-bonded Lennard -Jones potential (also known as the 6-12 
potential), used for calculating van der Waals forces. 



U(r) = ie 



r-(*r 



Another example is the Born (ionic) model of the ionic lattice. The first term in the next 
equation is Coulomb's law for a pair of ions, the second term is the short-range repulsion 
explained by Pauli's exclusion principle and the final term is the dispersion interaction 
term. Usually, a simulation only includes the dipolar term, although sometimes the 
quadrupolar term is included as well. 

%(^) = E 5rr + E A ex P =-p> + E <W + ■ ■ ■ 

In many-body potentials, the potential energy includes the effects of three or more particles 
interacting with each other. In simulations with pairwise potentials, global interactions in 
the system also exist, but they occur only through pairwise terms. In many-body potentials, 
the potential energy cannot be found by a sum over pairs of atoms, as these interactions are 
calculated explicitly as a combination of higher-order terms. In the statistical view, the 

dependency between the variables cannot in general be expressed using only pairwise 

ri2i 
products of the degrees of freedom. For example, the Tersoff potential , which was 

originally used to simulate carbon, silicon and germanium and has since been used for a 

wide range of other materials, involves a sum over groups of three atoms, with the angles 

between the atoms being an important factor in the potential. Other examples are the 

ri3i 
embedded-atom method (EAM) and the Tight-Binding Second Moment Approximation 

(TBSMA) potentials , where the electron density of states in the region of an atom is 

calculated from a sum of contributions from surrounding atoms, and the potential energy 

contribution is then a function of this sum. 
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Semi-empirical potentials 

Semi-empirical potentials make use of the matrix representation from quantum mechanics. 
However, the values of the matrix elements are found through empirical formulae that 
estimate the degree of overlap of specific atomic orbitals. The matrix is then diagonalized to 
determine the occupancy of the different atomic orbitals, and empirical formulae are used 
once again to determine the energy contributions of the orbitals. 

There are a wide variety of semi-empirical potentials, known as tight-binding potentials, 
which vary according to the atoms being modeled. 

Polarizable potentials 

Most classical force fields implicitly include the effect of polarizability, e.g. by scaling up 
the partial charges obtained from quantum chemical calculations. These partial charges are 
stationary with respect to the mass of the atom. But molecular dynamics simulations can 
explicitly model polarizability with the introduction of induced dipoles through different 
methods, such as Drude particles or fluctuating charges. This allows for a dynamic 
redistribution of charge between atoms which responds to the local chemical environment. 

For many years, polarizable MD simulations have been touted as the next generation. For 
homogenous liquids such as water, increased accuracy has been achieved through the 
inclusion of polarizability. Some promising results have also been achieved for 

proteins. However, it is still uncertain how to best approximate polarizability in a 
simulation. 

Ab-initio methods 

In classical molecular dynamics, a single potential energy surface (usually the ground state) 
is represented in the force field. This is a consequence of the Born-Oppenheimer 
approximation. If excited states, chemical reactions or a more accurate representation is 
needed, electronic behavior can be obtained from first principles by using a quantum 
mechanical method, such as Density Functional Theory. This is known as Ab Initio 
Molecular Dynamics (AIMD). Due to the cost of treating the electronic degrees of freedom, 
the computational cost of this simulations is much higher than classical molecular 
dynamics. This implies that AIMD is limited to smaller systems and shorter periods of time. 

Ab-initio quantum-mechanical methods may be used to calculate the potential energy of a 
system on the fly, as needed for conformations in a trajectory. This calculation is usually 
made in the close neighborhood of the reaction coordinate. Although various 
approximations may be used, these are based on theoretical considerations, not on 
empirical fitting. Ab-initio calculations produce a vast amount of information that is not 
available from empirical methods, such as density of electronic states or other electronic 
properties. A significant advantage of using ab-initio methods is the ability to study 
reactions that involve breaking or formation of covalent bonds, which correspond to 
multiple electronic states. 

A popular software for ab-initio molecular dynamics is the Car-Parrinello Molecular 
Dynamics (CPMD) package based on the density functional theory. 
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Hybrid QM/MM 

QM (quantum-mechanical) methods are very powerful. However, they are computationally 
expensive, while the MM (classical or molecular mechanics) methods are fast but suffer 
from several limitations (require extensive parameterization; enerqy estimates obtained are 
not very accurate; cannot be used to simulate reactions where covalent bonds are 
broken/formed; and are limited in their abilities for providinq accurate details reqardinq the 
chemical environment). A new class of method has emerqed that combines the qood points 
of QM (accuracy) and MM (speed) calculations. These methods are known as mixed or 
hybrid quantum-mechanical and molecular mechanics methods (hybrid QM/MM). The 
methodoloqy for such techniques was introduced by Warshel and coworkers. In the recent 
years have been pioneered by several qroups includinq: Arieh Warshel (University of 
Southern California), Weitao Yanq (Duke University), Sharon Hammes-Schiffer (The 
Pennsylvania State University), Donald Truhlar and Jiali Gao (University of Minnesota) and 
Kenneth Merz (University of Florida). 

The most important advantaqe of hybrid QM/MM methods is the speed. The cost of doinq 
classical molecular dynamics (MM) in the most straiqhtforward case scales 0(n ), where N 
is the number of atoms in the system. This is mainly due to electrostatic interactions term 
(every particle interacts with every other particle). However, use of cutoff radius, periodic 
pair-list updates and more recently the variations of the particle-mesh Ewald's (PME) 
method has reduced this between O(N) to 0(n ). In other words, if a system with twice 
many atoms is simulated then it would take between twice to four times as much computinq 
power. On the other hand the simplest ab-initio calculations typically scale 0(n ) or worse 
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(Restricted Hartree-Fock calculations have been suqqested to scale ~0(n ' )). To overcome 
the limitation, a small part of the system is treated quantum-mechanically (typically 
active-site of an enzyme) and the remaininq system is treated classically. 

In more sophisticated implementations, QM/MM methods exist to treat both liqht nuclei 
susceptible to quantum effects (such as hydroqens) and electronic states. This allows 
qeneration of hydroqen wave-functions (similar to electronic wave-functions). This 
methodoloqy has been useful in investiqatinq phenomenon such as hydroqen tunnelinq. One 
example where QM/MM methods have provided new discoveries is the calculation of 

hydride transfer in the enzyme liver alcohol dehydroqenase. In this case, tunnelinq is 
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important for the hydroqen, as it determines the reaction rate. 

Coarse-graining and reduced representations 

At the other end of the detail scale are coarse-qrained and lattice models. Instead of 
explicitly representinq every atom of the system, one uses "pseudo-atoms" to represent 
qroups of atoms. MD simulations on very larqe systems may require such larqe computer 
resources that they cannot easily be studied by traditional all-atom methods. Similarly, 
simulations of processes on lonq timescales (beyond about 1 microsecond) are prohibitively 
expensive, because they require so many timesteps. In these cases, one can sometimes 
tackle the problem by usinq reduced representations, which are also called coarse-qrained 
models. 

Examples for coarse qraininq (CG) methods are discontinuous molecular dynamics 
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(CG-DMD) and Go-models . Coarse-qraininq is done sometimes takinq larqer 

pseudo-atoms. Such united atom approximations have been used in MD simulations of 

bioloqical membranes. The aliphatic tails of lipids are represented by a few pseudo-atoms 
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by gathering 2-4 methylene groups into each pseudo-atom. 

The parameterization of these very coarse-grained models must be done empirically, by 
matching the behavior of the model to appropriate experimental data or all-atom 
simulations. Ideally, these parameters should account for both enthalpic and entropic 
contributions to free energy in an implicit way. When coarse-graining is done at higher 
levels, the accuracy of the dynamic description may be less reliable. But very 
coarse-grained models have been used successfully to examine a wide range of questions in 
structural biology. 

Examples of applications of coarse-graining in biophysics: 

• protein folding studies are often carried out using a single (or a few) pseudo-atoms per 
amino acid; 

• DNA supercoiling has been investigated using 1-3 pseudo-atoms per basepair, and at 
even lower resolution; 

• Packaging of double-helical DNA into bacteriophage has been investigated with models 
where one pseudo-atom represents one turn (about 10 basepairs) of the double helix; 

• RNA structure in the ribosome and other large systems has been modeled with one 
pseudo-atom per nucleotide. 

The simplest form of coarse-graining is the "united atom" (sometimes called "extended 
atom") and was used in most early MD simulations of proteins, lipids and nucleic acids. For 
example, instead of treating all four atoms of a CH methyl group explicitly (or all three 
atoms of CH methylene group), one represents the whole group with a single pseudo-atom. 
This pseudo-atom must, of course, be properly parameterized so that its van der Waals 
interactions with other groups have the proper distance-dependence. Similar 
considerations apply to the bonds, angles, and torsions in which the pseudo-atom 
participates. In this kind of united atom representation, one typically eliminates all explicit 
hydrogen atoms except those that have the capability to participate in hydrogen bonds 
("polar hydrogens"). An example of this is the Charmm 19 force-field. 

The polar hydrogens are usually retained in the model, because proper treatment of 
hydrogen bonds requires a reasonably accurate description of the directionality and the 
electrostatic interactions between the donor and acceptor groups. A hydroxyl group, for 
example, can be both a hydrogen bond donor and a hydrogen bond acceptor, and it would 
be impossible to treat this with a single OH pseudo-atom. Note that about half the atoms in 
a protein or nucleic acid are nonpolar hydrogens, so the use of united atoms can provide a 
substantial savings in computer time. 

Examples of applications 

Molecular dynamics is used in many fields of science. 

• First macromolecular MD simulation published (1977, Size: 500 atoms, Simulation Time: 
9.2 ps=0.0092 ns, Program: CHARMM precursor) Protein: Bovine Pancreatic Trypsine 
Inhibitor. This is one of the best studied proteins in terms of folding and kinetics. Its 
simulation published in Nature magazine paved the way for understanding protein 
motion as essential in function and not just accessory. 

• MD is the standard method to treat collision cascades in the heat spike regime, i.e. the 
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effects that energetic neutron and ion irradiation have on solids an solid surfaces. 
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The following two biophysical examples are not run-of-the-mill MD simulations. They 
illustrate almost heroic efforts to produce simulations of a system of very large size (a 
complete virus) and very long simulation times (500 microseconds): 

• MD simulation of the complete satellite tobacco mosaic virus (STMV) (2006, Size: 1 
million atoms, Simulation time: 50 ns, program: NAMD) This virus is a small, icosahedral 
plant virus which worsens the symptoms of infection by Tobacco Mosaic Virus (TMV). 
Molecular dynamics simulations were used to probe the mechanisms of viral assembly. 
The entire STMV particle consists of 60 identical copies of a single protein that make up 
the viral capsid (coating), and a 1063 nucleotide single stranded RNA genome. One key 
finding is that the capsid is very unstable when there is no RNA inside. The simulation 
would take a single 2006 desktop computer around 35 years to complete. It was thus 
done in many processors in parallel with continuous communication between them. ^ 

• Folding Simulations of the Villin Headpiece in All-Atom Detail (2006, Size: 20,000 atoms; 
Simulation time: 500 us = 500,000 ns, Program: folding@home) This simulation was run 
in 200,000 CPU's of participating personal computers around the world. These 
computers had the folding@home program installed, a large-scale distributed computing 
effort coordinated by Vijay Pande at Stanford University. The kinetic properties of the 
Villin Headpiece protein were probed by using many independent, short trajectories run 
by CPU's without continuous real-time communication. One technique employed was the 
Pfold value analysis, which measures the probability of folding before unfolding of a 
specific starting conformation. Pfold gives information about transition state structures 
and an ordering of conformations along the folding pathway. Each trajectory in a Pfold 
calculation can be relatively short, but many independent trajectories are needed. 

Molecular dynamics algorithms 
Integrators 

• Verlet-Stoermer integration 

• Runge-Kutta integration 

• Beeman's algorithm 

• Gear predictor - corrector 

• Constraint algorithms (for constrained systems) 

• Symplectic integrator 

Short-range interaction algorithms 

• Cell lists 

• Verlet list 

• Bonded interactions 

Long-range interaction algorithms 

• Ewald summation 

• Particle Mesh Ewald (PME) 

• Particle-Particle Particle Mesh P3M 

• Reaction Field Method 
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Parallelization strategies 

• Domain decomposition method (Distribution of system data for parallel computing) 

• Molecular Dynamics - Parallel Algorithms 

Major software for MD simulations 

Abalone (classical, implicit water) 

ABINIT (DFT) 

ACEMD [3] (running on NVIDIA GPUs: heavily optimized with CUDA) 

T271 

ADUN (classical, P2P database for simulations) 

AMBER (classical) 

Ascalaph (classical, GPU accelerated) 

CASTEP (DFT) 

CPMD (DFT) 

CP2K [29] (DFT) 

CHARMM (classical, the pioneer in MD simulation, extensive analysis tools) 

COSMOS (classical and hybrid QM/MM, quantum-mechanical atomic charges with 

BPT) 

Desmond (classical, parallelization with up to thousands of CPU's) 

DL_POLY [31] (classical) 

ESPResSo (classical, coarse-grained, parallel, extensible) 

Fireball [32] (tight-binding DFT) 

GROMACS (classical) 

GROMOS (classical) 

GULP (classical) 

Hippo [33] (classical) 

LAMMPS (classical, large-scale with spatial-decomposition of simulation domain for 

parallelism) 

MDynaMix (classical, parallel) 

MOLDY [25] (classical, parallel) latest release [34] 

Materials Studio [17] (Forcite MD using COMPASS, Dreiding, Universal, cvff and pcff 

forcefields in serial or parallel, QMERA (QM+MD), ONESTEP (DFT), etc.) 

MOSCITO (classical) 

NAMD (classical, parallelization with up to thousands of CPU's) 

NEWTON-X (ab initio, surface-hopping dynamics) 

ProtoMol (classical, extensible, includes multigrid electrostatics) 

PWscf (DFT) 

S/PHI/nX [37] (DFT) 

SIESTA (DFT) 

VASP (DFT) 

TINKER (classical) 

YASARA [38] (classical) 

ORAC [39] (classical) 

XMD (classical) 
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Related software 

• VMD - MD simulation trajectories can be visualized and analyzed. 

• PyMol - Molecular Visualization software written in python 

• Packmol Package for building starting configurations for MD in an automated fashion 

• Sirius - Molecular modeling, analysis and visualization of MD trajectories 

• esra - Lightweight molecular modeling and analysis library 
(Java/Jython/Mathematica). 

• Molecular Workbench - Interactive molecular dynamics simulations on your desktop 

• BOSS - MC in OPLS 

Specialized hardware for MD simulations 

• Anton - A specialized, massively parallel supercomputer designed to execute MD 
simulations. 

• MDGRAPE - A special purpose system built for molecular dynamics simulations, 
especially protein structure prediction. 

See also 

Molecular graphics 

Molecular modeling 

Computational chemistry 

Energy drift 

Force field in Chemistry 

Force field implementation 

Monte Carlo method 

Molecular Design software 

Molecular mechanics 

Molecular modeling on GPU 

Protein dynamics 

Implicit solvation 

Car-Parrinello method 

Symplectic numerical integration 

Software for molecular mechanics modeling 

Dynamical systems 

Theoretical chemistry 

Statistical mechanics 

Quantum chemistry 

Discrete element method 

List of nucleic acid simulation software 
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CHARMM (Chemistry at HARvard Macromolecular Mechanics) is the name of a 

widely used set of force fields for molecular dynamics as well as the name for the molecular 
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dynamics simulation and analysis package associated with them. The CHARMM 

Development Project involves a network of developers throughout the world working with 

Martin Karplus and his group at Harvard to develop and maintain the CHARMM program. 

Licenses for this software are available, for a fee, to people and groups working in 

academia. 

The commercial version of CHARMM, called CHARMm (note the lowercase 'm'), is 
available from Accelrys. 

CHARMM force fields 

The CHARMM force fields for proteins include: united-atom (sometimes called "extended 
atom") CHARMM19 [3] , all-atom CHARMM22 [4] and its dihedral potential corrected variant 
CHARMM22/CMAP. [5] In the CHARMM22 protein force field, the atomic partial charges 
were derived from quantum chemical calculations of the interactions between model 
compounds and water. Furthermore, CHARMM22 is parametrized for the TIP3P explicit 
water model. Nevertheless, it is frequently used with implicit solvents. Recently, a special 
version of CHARMM22/CMAP was reparametrized for consistent use with implicit solvent 
GBSW. [6] 

For DNA, RNA, and lipids, CHARMM27 [7] is used. Some force fields may be combined, for 
example CHARMM22 and CHARMM27 for the simulation of protein-DNA binding. 
Additionally, parameters for NAD+, sugars, fluorinated compounds, etc. may be 
downloaded [ . These force field version numbers refer to the CHARMM version where 
they first appeared, but may of course be used with subsequent versions of the CHARMM 
executable program. Likewise, these force fields may be used within other molecular 
dynamics programs that support them. 

CHARMM also includes polarizable force fields using two approaches. One is based on the 
fluctuating charge (FQ) model, also known as Charge Equilibration (CHEQ). [ ] [ ^ The 
other is based on the Drude shell or dispersion oscillator model. 
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CHARMM molecular dynamics program 

The CHARMM program allows generation and analysis of a wide range of molecular 
simulations. The most basic kinds of simulation are minimization of a given structure and 
production runs of a molecular dynamics trajectory. 

More advanced features include free energy perturbation (FEP), quasi-harmonic entropy 
estimation, correlation analysis and combined quantum, and molecular mechanics 
(QM/MM) methods. 

CHARMM is one of the oldest programs for molecular dynamics. It has accumulated a huge 
number of features, some of which are duplicated under several keywords with slight 
variations. This is an inevitable result of the large number of outlooks and groups working 
on CHARMM throughout the world. The changelog file [13] as well as CHARMM's source 
code are good places to look for the names and affiliations of the main developers. The 
involvement and coordination by Charles L. Brooks Ill's group at the University of Michigan 
is very salient. 

History of the program 

Around 1969, there was considerable interest in developing potential energy functions for 
small molecules. CHARMM originated at Martin Karplus's group at Harvard. Karplus and 
his then graduate student Bruce Gelin decided the time was ripe to develop a program that 
would make it possible to take a given amino acid sequence and a set of coordinates (e.g., 
from the X-ray structure) and to use this information to calculate the energy of the system 
as a function of the atomic positions. Karplus has acknowledged the importance of major 
inputs in the development of the (still nameless) program, including 

• Schneior Lifson's group at the Weizmann Institute, especially from Arieh Warshel who 
went to Harvard and brought his consistent force field (CCF) program with him; 

• Harold Scheraga's group at Cornell University; and 

• Awareness of Michael Levitt's pioneering energy calculations for proteins 

In the 1980s, finally a paper appeared and CHARMM made its public debut. Gelin's 
program had by then been considerably restructured. For the publication, Bob Bruccoleri 
came up with the name HARMM (HARvard Macromolecular Mechanics), but it didn't seem 
appropriate. So they added a C for Chemistry. Karplus said: "I sometimes wonder if 
Bruccoleri' s original suggestion would have served as a useful warning to inexperienced 
scientists working with the program." CHARMM has continued to grow and the latest 
release of the executable program was made in August 2008 as CHARMM35M. 

Running CHARMM Under Unix/Linux 

The general syntax for using the program is: 
charmm < filename. inp > filename. out 

charmm 

The actual name of the program (or script which runs the program) on the computer 
system being used. 

filename. inp 
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A text file which contains the CHARMM commands. It starts by loading the molecular 
topologies (top) and force field (par). Then one loads the molecular structures' 
Cartesian coordinates (e.g. from PDB files). One can then modify the molecules 
(adding hydrogens, changing secondary structure). The calculation section can include 
energy minimization, dynamics production, and analysis tools such as motion and 
energy correlations. 
filename. out 

The log file for the CHARMM run, containing echoed commands, and various amounts 
of command output. The output print level may be increased or decreased in general, 
and procedures such as minimization and dynamics have printout frequency 
specifications. The values for temperature, energy pressure, etc. are output at that 
frequency. 

CHARMM and Volunteer Computing 

Docking@Home, hosted by University of Delaware, one of the projects which use a 
opensource platform for the distributed computing, BOINC, adopts CHARMM to analyze 
the atomic details of protein-ligand interactions in terms of Molecular Dynamics (MD) 
simulations and minimizations. 

World Community Grid, sponsored by IBM, runs a project called The Clean Energy Project 
[15] which also uses CHARMM. 

See also 

• AMBER 

• Force field implementation 
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External links 

• Accelrys website (http://www.accelrys.com/) 

• CHARMM website (http://www.charmm.org/) with documentation (http://www. 
charmm.org/html/documentation/chmdoc.html) and helpful discussion forums (http:// 
165. 1 1 2. 184. 1 3//ubbthreads/ubbthreads.php?Cat=) 

• CHARMM tutorial (http://www.ch.embnet.org/MD_tutorial/) 

• MacKerell (http://www.pharmacy.umaryland.edu/faculty/amackere/) website 
including a Package of force field parameters for CHARMM (http://mackerell. 
umaryland.edu/CHARMM_ff_params.html) 

• C.Brooks website (http://www.scripps.edu/brooks/) 

• CHARMM page at Harvard (http://yuri.harvard.edu/) 

• Roux website (http://thallium.bsd.uchicago.edu/RouxLab/index.html) 

• Bernard R. Brooks Group Website (http://www.lobos.nih.gov/cbs/index.php) 

• VMD (http://www.ks.uiuc.edu/Research/vmd/) - visualization of CHARMM 
trajectories 

• Sirius (http://sirius.sdsc.edu) - visualization of CHARMM trajectories 

• Docking@Home (http://docking.cis.udel.edu/) 
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Statistical mechanics 

Statistical mechanics (or statistical thermodynamics ) is the application of 
probability theory, which includes mathematical tools for dealing with large populations, to 
the field of mechanics, which is concerned with the motion of particles or objects when 
subjected to a force. It provides a framework for relating the microscopic properties of 
individual atoms and molecules to the macroscopic or bulk properties of materials that can 
be observed in everyday life, therefore explaining thermodynamics as a natural result of 
statistics and mechanics (classical and quantum) at the microscopic level. 

It provides a molecular-level interpretation of thermodynamic quantities such as work, 
heat, free energy, and entropy, allowing the thermodynamic properties of bulk materials to 
be related to the spectroscopic data of individual molecules. This ability to make 
macroscopic predictions based on microscopic properties is the main advantage of 
statistical mechanics over classical thermodynamics. Both theories are governed by the 
second law of thermodynamics through the medium of entropy. However, entropy in 
thermodynamics can only be known empirically, whereas in statistical mechanics, it is a 
function of the distribution of the system on its micro-states. 

Statistical thermodynamics was born in 1870 with the work of Austrian physicist Ludwig 
Boltzmann, much of which was collectively published in Boltzmann's 1896 Lectures on Gas 
Theory} ' Boltzmann's original papers on the statistical interpretation of thermodynamics, 
the H-theorem, transport theory, thermal equilibrium, the equation of state of gases, and 
similar subjects, occupy about 2,000 pages in the proceedings of the Vienna Academy and 
other societies. The term "statistical thermodynamics" was proposed for use by the 
American thermodynamicist and physical chemist J. Willard Gibbs in 1902. According to 
Gibbs, the term "statistical", in the context of mechanics, i.e. statistical mechanics, was first 
used by the Scottish physicist James Clerk Maxwell in 1871. 

Overview 

The essential problem in statistical thermodynamics is to determine the distribution of a 
given amount of energy E over N identical systems. The goal of statistical 
thermodynamics is to understand and to interpret the measurable macroscopic properties 
of materials in terms of the properties of their constituent particles and the interactions 
between them. This is done by connecting thermodynamic functions to quantum-mechanic 
equations. Two central quantities in statistical thermodynamics are the Boltzmann factor 
and the partition function. 
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Fundamentals 

Central topics covered in statistical thermodynamics include: 

Microstates and configurations 

Boltzmann distribution law 

Partition function, Configuration integral or configurational partition function 

Thermodynamic equilibrium - thermal, mechanical, and chemical. 

Internal degrees of freedom - rotation, vibration, electronic excitation, etc. 

Heat capacity - Einstein solids, polyatomic gases, etc. 

Nernst heat theorem 

Fluctuations 

Gibbs paradox 

Degeneracy 

Lastly, and most importantly, the formal definition of entropy of a thermodynamic system 
from a statistical perspective is called statistical entropy, and is defined as: 

S = k B lnn 
where 

k D is Boltzmann's constant 1.38066xl0 -23 J K _1 and 

B J 

fiis the number of microstates corresponding to the observed thermodynamic 

macrostate. 
A common mistake is taking this formula as a hard general definition of entropy. This 
equation is valid only if each microstate is equally accessible (each microstate has an equal 
probability of occurring). 

Boltzmann Distribution 

If the system is large the Boltzmann distribution could be used (the Boltzmann distribution 
is an approximate result) 

71; <X e k B T . 



n. 
This can now be used with Pi 



N' 



n, 



N yiEll levels -i^r 

History 

In 1738, Swiss physicist and mathematician Daniel Bernoulli published Hydrodynamica 
which laid the basis for the kinetic theory of gases. In this work, Bernoulli positioned the 
argument, still used to this day, that gases consist of great numbers of molecules moving in 
all directions, that their impact on a surface causes the gas pressure that we feel, and that 
what we experience as heat is simply the kinetic energy of their motion. 

In 1859, after reading a paper on the diffusion of molecules by Rudolf Clausius, Scottish 
physicist James Clerk Maxwell formulated the Maxwell distribution of molecular velocities, 
which gave the proportion of molecules having a certain velocity in a specific range. This 
was the first-ever statistical law in physics. Five years later, in 1864, Ludwig Boltzmann, 
a young student in Vienna, came across Maxwell's paper and was so inspired by it that he 
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spent much of his long and distinguished life developing the subject further. 

Hence, the foundations of statistical thermodynamics were laid down in the late 1800s by 
those such as Maxwell, Ludwig Boltzmann, Max Planck, Rudolf Clausius, and Willard Gibbs 
who began to apply statistical and quantum atomic theory to ideal gas bodies. 
Predominantly, however, it was Maxwell and Boltzmann, working independently, who 
reached similar conclusions as to the statistical nature of gaseous bodies. Yet, one must 
consider Boltzmann to be the "father" of statistical thermodynamics with his 1875 
derivation of the relationship between entropy S and multiplicity O, the number of 
microscopic arrangements (microstates) producing the same macroscopic state 
(macrostate) for a particular system. ^ 

Fundamental postulate 

The fundamental postulate in statistical mechanics (also known as the equal a priori 
probability postulate) is the following: 

Given an isolated system in equilibrium, it is found with equal probability in each of its 
accessible microstates. 

This postulate is a fundamental assumption in statistical mechanics - it states that a system 
in equilibrium does not have any preference for any of its available microstates. Given Q 
microstates at a particular energy, the probability of finding the system in a particular 
microstate is p = 1/Q. 

This postulate is necessary because it allows one to conclude that for a system at 
equilibrium, the thermodynamic state (macrostate) which could result from the largest 
number of microstates is also the most probable macrostate of the system. 

The postulate is justified in part, for classical systems, by Liouville's theorem (Hamiltonian), 
which shows that if the distribution of system points through accessible phase space is 
uniform at some time, it remains so at later times. 

Similar justification for a discrete system is provided by the mechanism of detailed balance. 

This allows for the definition of the information function (in the context of information 
theory): 

J = -"^pilnpi = {Inp}. 
i 
When all the probabilities (rhos) are equal, I is maximal, and we have minimal information 

about the system. When our information is maximal (i.e., one rho is equal to one and the 

rest to zero, such that we know what state the system is in), the function is minimal. 

This "information function" is the same as the reduced entropic function in 

thermodynamics. 
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Statistical ensembles 

Microcanonical ensemble 

In microcanonical ensemble N, V and E are fixed. Since the second law of thermodynamics 
applies to isolated systems, the first case investigated will correspond to this case. The 
Microcanonical ensemble describes an isolated system. 

The entropy of such a system can only increase, so that the maximum of its entropy 
corresponds to an equilibrium state for the system. 

Because an isolated system keeps a constant energy, the total energy of the system does 
not fluctuate. Thus, the system can access only those of its micro-states that correspond to 
a given value E of the energy. The internal energy of the system is then strictly equal to its 
energy. 

Let us call SX-E)the number of micro-states corresponding to this value of the system's 
energy. The macroscopic state of maximal entropy for the system is the one in which all 
micro-states are equally likely to occur, with probability l/fi(.E), during the system's 
fluctuations. 

S = 



IUE) ( . .. x 



where 

S is the system entropy, and 

freis Boltzmann's constant. 

Canonical ensemble 

In canonical ensemble N, V and T are fixed. Invoking the concept of the canonical 
ensemble, it is possible to derive the probability ^that a macroscopic system in thermal 
equilibrium with its environment, will be in a given microstate with energy E-i according to 
the Boltzmann distribution: 



Pi = 



e -m 



where 3 = — , 

The temperature T arises from the fact that the system is in thermal equilibrium with its 
environment. The probabilities of the various microstates must add to one, and the 
normalization factor in the denominator is the canonical partition function: 

J-max 

,. : 
where Ei is the energy of the i th microstate of the system. The partition function is a 
measure of the number of states accessible to the system at a given temperature. The 
article canonical ensemble contains a derivation of Boltzmann's factor and the form of the 
partition function from first principles. 

To sum up, the probability of finding a system at temperature Tin a particular state with 
energy £;is 
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Thermodynamic Connection 

The partition function can be used to find the expected (average) value of any microscopic 
property of the system, which can then be related to macroscopic variables. For instance, 
the expected value of the microscopic energy £is interpreted as the microscopic definition 
of the thermodynamic variable internal energy U , and can be obtained by taking the 
derivative of the partition function with respect to the temperature. Indeed, 

(E) gjj^!f: 1^ 

V ' Z Zdd 

implies, together with the interpretation of {E) as U , the following microscopic definition 
of internal energy: 

The entropy can be calculated by (see Shannon entropy) 
S e-P E ' 

"- ■' ; £* 

\ \ 

which implies that 

jm = u _ TS = F 

Q 
is the free energy of the system or in other words, 

Z = e-0* 
Having microscopic expressions for the basic thermodynamic potentials U (internal 
energy), S (entropy) and F{free energy) is sufficient to derive expressions for other 
thermodynamic quantities. The basic strategy is as follows. There may be an intensive or 
extensive quantity that enters explicitly in the expression for the microscopic energy Ei , 
for instance magnetic field (intensive) or volume (extensive). Then, the conjugate 
thermodynamic variables are derivatives of the internal energy. The macroscopic 
magnetization (extensive) is the derivative of [/with respect to the (intensive) magnetic 
field, and the pressure (intensive) is the derivative of [/with respect to volume (extensive). 
The treatment in this section assumes no exchange of matter (i.e. fixed mass and fixed 
particle numbers). However, the volume of the system is variable which means the density 
is also variable. 

This probability can be used to find the average value, which corresponds to the 
macroscopic value, of any property, J , that depends on the energetic state of the system 
by using the formula: 

e~ pEi 

i i 

where {J) is the average value of property J . This equation can be applied to the internal 
energy, U : 

p -pBi 

Subsequently, these equations can be combined with known thermodynamic relationships 
between U and \ 'to arrive at an expression for pressure in terms of only temperature, 
volume and the partition function. Similar relationships in terms of the partition function 
can be derived for other thermodynamic properties as shown in the following table; see also 
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the detailed explanation in configuration integral 



[6] 



Helmholtz free energy: 




Internal energy: 




Pressure: 


(8F\ 1 (d\nZ\ 
\dv) NT P\8V) NT 


Entropy: 


3 = k(\n Z + BU) 


Gibbs free energy: 


° — ~¥ + H^L 


Enthalpy: 


H = U + PV 


Constant volume heat capacity: 


*"(SL 


Constant pressure heat capacity: 


*-(£)„ 


Chemical potential: 


1 (8\nZ\ 



To clarify, this is not a grand canonical ensemble. 

It is often useful to consider the energy of a given molecule to be distributed among a 
number of modes. For example, translational energy refers to that portion of energy 
associated with the motion of the center of mass of the molecule. Configurational energy 
refers to that portion of energy associated with the various attractive and repulsive forces 
between molecules in a system. The other modes are all considered to be internal to each 
molecule. They include rotational, vibrational, electronic and nuclear modes. If we assume 
that each mode is independent (a questionable assumption) the total energy can be 
expressed as the sum of each of the components: 

hi = hit -\- h/ c -\- hj n -\- ±L e -\- £; r -|- hi v 
Where the subscripts t , c t n, e, r t and v correspond to translational, configurational, 
nuclear, electronic, rotational and vibrational modes, respectively. The relationship in this 
equation can be substituted into the very first equation to give: 






- ,3 (£■',,+£„+£„, +£ =; +£V; +£„; ) 



e -/3£',; e -.s£'c, e -£ £ ™ e ~ pE=i e~ pB " e - * 3 ^ 



If we can assume all these modes are completely uncoupled and uncorrelated, so all these 
factors are in a probability sense completely independent, then 

Thus a partition function can be defined for each mode. Simple expressions have been 
derived relating each of the various modes to various measurable molecular properties, 
such as the characteristic rotational or vibrational frequencies. 

Expressions for the various molecular partition functions are shown in the following table. 



Nuclear 



I 



(T < 10 s K) 
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Electronic 


Z e = Woe ^n a+ w^o^ + ... 


Vibrational 


e -fl.j/2T 

Z * " II 1 _ -i^/r 


Rotational (linear) 


T 

0" 


Rotational (non-linear) 




1 / ttT 3 


Translational 


[2-KmkTf^ 


Configurational (ideal gas) 


Z, = V 



These equations can be combined with those in the first table to determine the contribution 
of a particular energy mode to a thermodynamic property. For example the "rotational 
pressure" could be determined in this manner. The total pressure could be found by 
summing the pressure contributions from all of the individual modes, ie: 

P = P t + P c + P n + P e + Pt + P. 



Grand canonical ensemble 

In grand canonical ensemble V, Tand chemical potential are fixed. If the system under 
study is an open system, (matter can be exchanged), but particle number is not conserved, 
we would have to introduce chemical potentials, u., j = l,...,n and replace the canonical 
partition function with the grand canonical partition function: 



H(V,2»=X> X P [p 






where N.. is the number of i 

y 



th 



•th 



species particles in the i configuration. Sometimes, we also 
have other variables to add to the partition function, one corresponding to each conserved 
quantity. Most of them, however, can be safely interpreted as chemical potentials. In most 
condensed matter systems, things are nonrelativistic and mass is conserved. However, most 
condensed matter systems of interest also conserve particle number approximately 
(metastably) and the mass (nonrelativistically) is none other than the sum of the number of 
each type of particle times its mass. Mass is inversely related to density, which is the 
conjugate variable to pressure. For the rest of this article, we will ignore this complication 
and pretend chemical potentials don't matter. See grand canonical ensemble. 

Let's rework everything using a grand canonical ensemble this time. The volume is left 
fixed and does not figure in at all in this treatment. As before, j is the index for those 
particles of species j and i is the index for microstate z: 

Bxpi-PiEt-EjfyXv)) 



r 



Ni 






exp(-/?(E i -£ j Mj%)) 



Grand potential: 
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Internal energy: 
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Particle number: 


Al /3 V 9fM ), 


Entropy: 


S = k(\nE + SU -8^2 fiiNi) 

i 


Helmholtz free energy: 


lnE ^^ lit /91n5\ 



Equivalence between descriptions at the thermodynamic limit 

All of the above descriptions differ in the way they allow the given system to fluctuate 
between its configurations. 

In the micro-canonical ensemble, the system exchanges no energy with the outside world, 
and is therefore not subject to energy fluctuations; in the canonical ensemble, the system is 
free to exchange energy with the outside in the form of heat. 

In the thermodynamic limit, which is the limit of large systems, fluctuations become 
negligible, so that all these descriptions converge to the same description. In other words, 
the macroscopic behavior of a system does not depend on the particular ensemble used for 
its description. 

Given these considerations, the best ensemble to choose for the calculation of the 
properties of a macroscopic system is that ensemble which allows the result to be derived 
most easily. 

Random walks 

The study of long chain polymers has been a source of problems within the realms of 
statistical mechanics since about the 1950s. One of the reasons however that scientists 
were interested in their study is that the equations governing the behaviour of a polymer 
chain were independent of the chain chemistry. What is more, the governing equation turns 
out to be a random (diffusive) walk in space. Indeed, the Schrodinger equation is itself a 
diffusion equation in imaginary time, t' = it . 

Random walks in time 

The first example of a random walk is one in space, whereby a particle undergoes a random 
motion due to external forces in its surrounding medium. A typical example would be a 
pollen grain in a beaker of water. If one could somehow "dye" the path the pollen grain has 
taken, the path observed is defined as a random walk. 

Consider a toy problem, of a train moving along a ID track in the x-direction. Suppose that 
the train moves either a distance of + or - a fixed distance b, depending on whether a coin 
lands heads or tails when flipped. Lets start by considering the statistics of the steps the toy 
train takes (where Sjis the ith step taken): 

(Si) = 0; due to a priori equal probabilities 
{SiS 3 } = &%. 

The second quantity is known as the correlation function. The delta is the kronecker delta 
which tells us that if the indices i and j are different, then the result is 0, but if i = j then the 
kronecker delta is 1, so the correlation function returns a value of b 2 - This makes sense, 
because if z = j then we are considering the same step. Rather trivially then it can be shown 
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that the average displacement of the train on the x-axis is 0; 

1=1 

AT 



z=l 



X=l 



As stated {Si} is 0, so the sum of is still 0. It can also be shown, using the same method 
demonstrated above, to calculate the root mean square value of problem. The result of this 
calculation is given below 



■"rras y \ / DVjIi. 

From the diffusion equation it can be shown that the distance a diffusing particle moves in 
a media is proportional to the root of the time the system has been diffusing for, where the 
proportionality constant is the root of the diffusion constant. The above relation, although 
cosmetically different reveals similar physics, where N is simply the number of steps moved 
(is loosely connected with time) and b is the characteristic step length. As a consequence 
we can consider diffusion as a random walk process. 

Random walks in space 

Random walks in space can be thought of as snapshots of the path taken by a random 
walker in time. One such example is the spatial configuration of long chain polymers. 

There are two types of random walk in space: self-avoiding random walks, where the links 
of the polymer chain interact and do not overlap in space, and pure random walks, where 
the links of the polymer chain are non-interacting and links are free to lie on top of one 
another. The former type is most applicable to physical systems, but their solutions are 
harder to get at from first principles. 

By considering a freely jointed, non-interacting polymer chain, the end-to-end vector is 

R = ^T; where r tis the vector position of the z'-th link in the chain. As a result of the 

i=i 
central limit theorem, if N >> 1 then we expect a Gaussian distribution for the end-to-end 
vector. We can also make statements of the statistics of the links themselves; 
{i\} = 0; by the isotropy of space 

{rj ■ Tj) = 3£r<5jj; all the links in the chain are uncorrelated with one another 
Using the statistics of the individual links, it is easily shown that {R} = Oand 
(R ■ R) = 3Nb . Notice this last result is the same as that found for random walks in time. 

Assuming, as stated, that that distribution of end-to-end vectors for a very large number of 
identical polymer chains is gaussian, the probability distribution has the following form 

1 -3R R 

p^Y /g eXP 2JV6 2 

What use is this to us? Recall that according to the principle of equally likely a priori 
probabilities, the number of microstates, CI, at some physical value is directly proportional 
to the probability distribution at that physical value, viz; 

Q (R) = cP (R) 
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where c is an arbitrary proportionality constant. Given our distribution function, there is a 

maxima corresponding to R = . Physically this amounts to there being more microstates 

which have an end-to-end vector of than any other microstate. Now by considering 

S(R)=fe fl lnfi(R) 

AS (R) = S (R) - S (0) 

AF = -TAS (R) 

where F is the Helmholtz free energy it is trivial to show that 

A Hookian spring! 

This result is known as the Entropic Spring Result and amounts to saying that upon 
stretching a polymer chain you are doing work on the system to drag it away from its 
(preferred) equilibrium state. An example of this is a common elastic band, composed of 
long chain (rubber) polymers. By stretching the elastic band you are doing work on the 
system and the band behaves like a conventional spring. What is particularly astonishing 
about this result however, is that the work done in stretching the polymer chain can be 
related entirely to the change in entropy of the system as a result of the stretching. 

Classical thermodynamics vs. statistical thermodynamics 

As an example, from a classical thermodynamics point of view one might ask what is it 
about a thermodynamic system of gas molecules, such as ammonia NH , that determines 
the free energy characteristic of that compound? Classical thermodynamics does not 
provide the answer. If, for example, we were given spectroscopic data, of this body of gas 
molecules, such as bond length, bond angle, bond rotation, and flexibility of the bonds in 
NH we should see that the free energy could not be other than it is. To prove this true, we 
need to bridge the gap between the microscopic realm of atoms and molecules and the 
macroscopic realm of classical thermodynamics. From physics, statistical mechanics 
provides such a bridge by teaching us how to conceive of a thermodynamic system as an 
assembly of units. More specifically, it demonstrates how the thermodynamic parameters of 
a system, such as temperature and pressure, are interpretable in terms of the parameters 

T71 

descriptive of such constituent atoms and molecules. 

In a bounded system, the crucial characteristic of these microscopic units is that their 
energies are quantized. That is, where the energies accessible to a macroscopic system 
form a virtual continuum of possibilities, the energies open to any of its submicroscopic 
components are limited to a discontinuous set of alternatives associated with integral 
values of some quantum number. 
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Notes 

[1] The terms "Statistical mechanics" and "statistical thermodynamics" are used interchangeably. "Statistical 

physics" is a broader term which includes statistical mechanics, but is sometimes also used as a synonym for 

statistical mechanics 
[2] On history of fundamentals of statistical thermodynamics (http://www.worldscibooks.com/phy_etextbook/ 

2012/2012_chap01.pdf) (section 1.2) 
[3] Schrodinger, Erwin (1946). Statistical Thermodynamics. Dover Publications, Inc.. ISBN 0-486-66101-6. OCLC 

20056858 (http://worldcat.org/oclc/20056858). 
[4] Mahon, Basil (2003). The Man Who Changed Everything - the Life of James Clerk Maxwell. Hoboken, NJ: 

Wiley. ISBN 0-470-86171-1. OCLC 52358254 62045217 (http://worldcat.org/oclc/52358254+62045217). 
[5] Perrot, Pierre (1998). AtoZ of Thermodynamics. Oxford University Press. ISBN 0-19-856552-6. OCLC 

123283342 38073404 (http://worldcat.org/oclc/123283342 + 38073404). 
[6] http://clesm.mae.ufl.edu/wiki.pub/index.php/Configuration_integral_%28statistical_mechanics%29 
[7] Nash, Leonard K. (1974). Elements of Statistical Thermodynamics, 2nd Ed.. Dover Publications, Inc.. ISBN 

0-486-44978-5. OCLC 61513215 (http://worldcat.org/oclc/61513215). 
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External links 

• Philosophy of Statistical Mechanics (http://plato.stanford.edu/entries/ 
statphys-statmech/) article by Lawrence Sklar for the Stanford Encyclopedia of 
Philosophy. 

• Sklogwiki - Thermodynamics, statistical mechanics, and the computer simulation of 
materials, (http://www.sklogwiki.org/) SklogWiki is particularly orientated towards 
liquids and soft condensed matter. 

• Statistical Thermodynamics (http://history.hyperjeff.net/statmech.html) - Historical 
Timeline 



Statistical field theory 



A statistical field theory is any model in statistical mechanics where the degrees of 
freedom comprise a field or fields. In other words, the microstates of the system are 
expressed through field configurations. It is closely related to quantum field theory, which 
describes the quantum mechanics of fields, and shares with it many phenomena, such as 
renormalization. If the system involves polymers, it is also known as polymer field theory. 

In fact, by performing a Wick rotation from Minkowski space to Euclidean space, many 
results of statistical field theory can be applied directly to its quantum equivalent. The 
correlation functions of a statistical field theory are called Schwinger functions, and their 
properties are described by the Osterwalder-Schrader axioms. 

Statistical field theories are widely used to describe systems in polymer physics or 
biophysics, such as polymer films, nanostructured block copolymers or polyelectrolytes 
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External links 

• Problems in Statistical Field Theory (http://www.gursey.gov.tr/~mgh/rg2006/ 
problemsets.html) 

• Particle and Polymer Field Theory Group (http://www-dick.chemie.uni-regensburg.de/ 
group/stephanbaeurle/) 

Computational chemistry 

Computational chemistry is a branch of chemistry that uses computers to assist in 
solving chemical problems. It uses the results of theoretical chemistry, incorporated into 
efficient computer programs, to calculate the structures and properties of molecules and 
solids. While its results normally complement the information obtained by chemical 
experiments, it can in some cases predict hitherto unobserved chemical phenomena. It is 
widely used in the design of new drugs and materials. 

Examples of such properties are structure (i.e. the expected positions of the constituent 
atoms), absolute and relative (interaction) energies, electronic charge distributions, dipoles 
and higher multipole moments, vibrational frequencies, reactivity or other spectroscopic 
quantities, and cross sections for collision with other particles. 

The methods employed cover both static and dynamic situations. In all cases the computer 
time and other resources (such as memory and disk space) increase rapidly with the size of 
the system being studied. That system can be a single molecule, a group of molecules, or a 
solid. Computational chemistry methods range from highly accurate to very approximate; 
highly accurate methods are typically feasible only for small systems. Ab initio methods are 
based entirely on theory from first principles. Other (typically less accurate) methods are 
called empirical or semi-empirical because they employ experimental results, often from 
acceptable models of atoms or related molecules, to approximate some elements of the 
underlying theory. 

Both ab initio and semi-empirical approaches involve approximations. These range from 
simplified forms of the first-principles equations that are easier or faster to solve, to 
approximations limiting the size of the system (for example, Periodic boundary conditions), 
to fundamental approximations to the underlying equations that are required to achieve any 
solution to them at all. For example, most ab initio calculations make the 
Born-Oppenheimer approximation, which greatly simplifies the underlying Schrodinger 
Equation by freezing the nuclei in place during the calculation. In principle, ab initio 
methods eventually converge to the exact solution of the underlying equations as the 
number of approximations is reduced. In practice, however, it is impossible to eliminate all 
approximations, and residual error inevitably remains. The goal of computational chemistry 
is to minimize this residual error while keeping the calculations tractable. 

History 

Building on the founding discoveries and theories in the history of quantum mechanics, the 
first theoretical calculations in chemistry were those of Walter Heitler and Fritz London in 
1927. The books that were influential in the early development of computational quantum 
chemistry include: Linus Pauling and E. Bright Wilson's 1935 Introduction to Quantum 
Mechanics - with Applications to Chemistry, Eyring, Walter and Kimball's 1944 Quantum 
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Chemistry, Heitler's 1945 Elementary Wave Mechanics - with Applications to Quantum 
Chemistry, and later Coulson's 1952 textbook Valence, each of which served as primary 
references for chemists in the decades to follow. 

With the development of efficient computer technology in the 1940s, the solutions of 
elaborate wave equations for complex atomic systems began to be a realizable objective. In 
the early 1950s, the first semi-empirical atomic orbital calculations were carried out. 
Theoretical chemists became extensive users of the early digital computers. A very detailed 
account of such use in the United Kingdom is given by Smith and Sutcliffe. The first ab 
initio Hartree-Fock calculations on diatomic molecules were carried out in 1956 at MIT, 
using a basis set of Slater orbitals. For diatomic molecules, a systematic study using a 
minimum basis set and the first calculation with a larger basis set were published by Ransil 
and Nesbet respectively in 1960. The first polyatomic calculations using Gaussian orbitals 
were carried out in the late 1950s. The first configuration interaction calculations were 
carried out in Cambridge on the EDSAC computer in the 1950s using Gaussian orbitals by 
Boys and coworkers. By 1971, when a bibliography of ab initio calculations was 
published, the largest molecules included were naphthalene and azulene. Abstracts 

T71 

of many earlier developments in ab initio theory have been published by Schaefer. 

In 1964, Hiickel method calculations (using a simple linear combination of atomic orbitals 
(LCAO) method for the determination of electron energies of molecular orbitals of n 
electrons in conjugated hydrocarbon systems) of molecules ranging in complexity from 
butadiene and benzene to ovalene, were generated on computers at Berkeley and Oxford. 
These empirical methods were replaced in the 1960s by semi-empirical methods such as 
CNDO. [9] 

In the early 1970s, efficient ab initio computer programs such as ATMOL, GAUSSIAN, 
IBMOL, and POLYAYTOM, began to be used to speed up ab initio calculations of molecular 
orbitals. Of these four programs, only GAUSSIAN, now massively expanded, is still in use, 
but many other programs are now in use. At the same time, the methods of molecular 
mechanics, such as MM2, were developed, primarily by Norman Allinger. ] 

One of the first mentions of the term "computational chemistry" can be found in the 1970 
book Computers and Their Role in the Physical Sciences by Sidney Fernbach and Abraham 
Haskell Taub, where they state "It seems, therefore, that 'computational chemistry' can 

finally be more and more of a reality." During the 1970s, widely different methods began 

n 21 
to be seen as part of a new emerging discipline of computational chemistry. 1 J The Journal 

of Computational Chemistry was first published in 1980. 

Concepts 

The term theoretical chemistry may be defined as a mathematical description of chemistry, 
whereas computational chemistry is usually used when a mathematical method is 
sufficiently well developed that it can be automated for implementation on a computer. 
Note that the words exact and perfect do not appear here, as very few aspects of chemistry 
can be computed exactly. However, almost every aspect of chemistry can be described in a 
qualitative or approximate quantitative computational scheme. 

Molecules consist of nuclei and electrons, so the methods of quantum mechanics apply. 
Computational chemists often attempt to solve the non-relativistic Schrodinger equation, 
with relativistic corrections added, although some progress has been made in solving the 
fully relativistic Dirac equation. In principle, it is possible to solve the Schrodinger equation 
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in either its time-dependent or time-independent form, as appropriate for the problem in 
hand; in practice, this is not possible except for very small systems. Therefore, a great 
number of approximate methods strive to achieve the best trade-off between accuracy and 
computational cost. Accuracy can always be improved with greater computational cost. 
Significant errors can present themselves in ab initio models comprising many electrons, 
due to the computational expense of full relativistic-inclusive methods. This complicates the 
study of molecules interacting with high atomic mass unit atoms, such as transitional 
metals and their catalytic properties. Present algorithms in computational chemistry can 
routinely calculate the properties of molecules that contain up to about 40 electrons with 
sufficient accuracy. Errors for energies can be less than a few kj/mol. For geometries, bond 
lengths can be predicted within a few picometres and bond angles within 0.5 degrees. The 
treatment of larger molecules that contain a few dozen electrons is computationally 
tractable by approximate methods such as density functional theory (DFT). There is some 
dispute within the field whether or not the latter methods are sufficient to describe complex 
chemical reactions, such as those in biochemistry. Large molecules can be studied by 
semi-empirical approximate methods. Even larger molecules are treated by classical 
mechanics methods that employ what are called molecular mechanics. In QM/MM methods, 
small portions of large complexes are treated quantum mechanically (QM), and the 
remainder is treated approximately (MM). 

In theoretical chemistry, chemists, physicists and mathematicians develop algorithms and 
computer programs to predict atomic and molecular properties and reaction paths for 
chemical reactions. Computational chemists, in contrast, may simply apply existing 
computer programs and methodologies to specific chemical questions. There are two 
different aspects to computational chemistry: 

• Computational studies can be carried out in order to find a starting point for a laboratory 
synthesis, or to assist in understanding experimental data, such as the position and 
source of spectroscopic peaks. 

• Computational studies can be used to predict the possibility of so far entirely unknown 
molecules or to explore reaction mechanisms that are not readily studied by experimental 
means. 

Thus, computational chemistry can assist the experimental chemist or it can challenge the 
experimental chemist to find entirely new chemical objects. 

Several major areas may be distinguished within computational chemistry: 

• The prediction of the molecular structure of molecules by the use of the simulation of 
forces, or more accurate quantum chemical methods, to find stationary points on the 
energy surface as the position of the nuclei is varied. 

• Storing and searching for data on chemical entities (see chemical databases). 

• Identifying correlations between chemical structures and properties (see QSPR and 
QSAR). 

• Computational approaches to help in the efficient synthesis of compounds. 

• Computational approaches to design molecules that interact in specific ways with other 
molecules (e.g. drug design and catalysis). 
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Methods 

A single molecular formula can represent a number of molecular isomers. Each isomer is a 
local minimum on the energy surface (called the potential energy surface) created from the 
total energy (i.e., the electronic energy, plus the repulsion energy between the nuclei) as a 
function of the coordinates of all the nuclei. A stationary point is a geometry such that the 
derivative of the energy with respect to all displacements of the nuclei is zero. A local 
(energy) minimum is a stationary point where all such displacements lead to an increase in 
energy. The local minimum that is lowest is called the global minimum and corresponds to 
the most stable isomer. If there is one particular coordinate change that leads to a decrease 
in the total energy in both directions, the stationary point is a transition structure and the 
coordinate is the reaction coordinate. This process of determining stationary points is 
called geometry optimization. 

The determination of molecular structure by geometry optimization became routine only 
after efficient methods for calculating the first derivatives of the energy with respect to all 
atomic coordinates became available. Evaluation of the related second derivatives allows 
the prediction of vibrational frequencies if harmonic motion is estimated. More importantly, 
it allows for the characterization of stationary points. The frequencies are related to the 
eigenvalues of the Hessian matrix, which contains second derivatives. If the eigenvalues are 
all positive, then the frequencies are all real and the stationary point is a local minimum. If 
one eigenvalue is negative (i.e., an imaginary frequency), then the stationary point is a 
transition structure. If more than one eigenvalue is negative, then the stationary point is a 
more complex one, and is usually of little interest. When one of these is found, it is 
necessary to move the search away from it if the experimenter is looking solely for local 
minima and transition structures. 

The total energy is determined by approximate solutions of the time-dependent Schrodinger 
equation, usually with no relativistic terms included, and by making use of the 
Born-Oppenheimer approximation, which allows for the separation of electronic and 
nuclear motions, thereby simplifying the Schrodinger equation. This leads to the evaluation 
of the total energy as a sum of the electronic energy at fixed nuclei positions and the 
repulsion energy of the nuclei. A notable exception are certain approaches called direct 
quantum chemistry, which treat electrons and nuclei on a common footing. Density 
functional methods and semi-empirical methods are variants on the major theme. For very 
large systems, the relative total energies can be compared using molecular mechanics. The 
ways of determining the total energy to predict molecular structures are: 

Ab initio methods 

The programs used in computational chemistry are based on many different 
quantum-chemical methods that solve the molecular Schrodinger equation associated with 
the molecular Hamiltonian. Methods that do not include any empirical or semi-empirical 
parameters in their equations - being derived directly from theoretical principles, with no 
inclusion of experimental data - are called ab initio methods. This does not imply that the 
solution is an exact one; they are all approximate quantum mechanical calculations. It 
means that a particular approximation is rigorously defined on first principles (quantum 
theory) and then solved within an error margin that is qualitatively known beforehand. If 
numerical iterative methods have to be employed, the aim is to iterate until full machine 
accuracy is obtained (the best that is possible with a finite word length on the computer, 
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and within the mathematical and/or physical approximations made). 

The simplest type of ab initio 

electronic structure calculation is 

the Hartree-Fock (HF) scheme, an 

extension of molecular orbital 

theory, in which the correlated 

electron-electron repulsion is not 

specifically taken into account; 

only its average effect is included 

in the calculation. As the basis set 

size is increased, the energy and 

wave function tend towards a limit 

called the Hartree-Fock limit. 

Many types of calculations (known 

as post-Hartree-Fock methods) 

begin with a Hartree-Fock calculation and subsequently correct for electron-electron 

repulsion, referred to also as electronic correlation. As these methods are pushed to the 

limit, they approach the exact solution of the non-relativistic Schrodinger equation. In order 

to obtain exact agreement with experiment, it is necessary to include relativistic and spin 

orbit terms, both of which are only really important for heavy atoms. In all of these 

approaches, in addition to the choice of method, it is necessary to choose a basis set. This is 

a set of functions, usually centered on the different atoms in the molecule, which are used 

to expand the molecular orbitals with the LCAO ansatz. Ab initio methods need to define a 

level of theory (the method) and a basis set. 

The Hartree-Fock wave function is a single configuration or determinant. In some cases, 
particularly for bond breaking processes, this is quite inadequate, and several 
configurations need to be used. Here, the coefficients of the configurations and the 
coefficients of the basis functions are optimized together. 

The total molecular energy can be evaluated as a function of the molecular geometry; in 
other words, the potential energy surface. Such a surface can be used for reaction 
dynamics. The stationary points of the surface lead to predictions of different isomers and 
the transition structures for conversion between isomers, but these can be determined 
without a full knowledge of the complete surface. 

A particularly important objective, called computational thermochemistry, is to calculate 
thermochemical quantities such as the enthalpy of formation to chemical accuracy. 
Chemical accuracy is the accuracy required to make realistic chemical predictions and is 
generally considered to be 1 kcal/mol or 4 kj/mol. To reach that accuracy in an economic 
way it is necessary to use a series of post-Hartree-Fock methods and combine the results. 
These methods are called quantum chemistry composite methods. 
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Density Functional methods 

Density functional theory (DFT) methods are often considered to be ab initio methods for 
determining the molecular electronic structure, even though many of the most common 
functionals use parameters derived from empirical data, or from more complex calculations. 
In DFT, the total energy is expressed in terms of the total one-electron density rather than 
the wave function. In this type of calculation, there is an approximate Hamiltonian and an 
approximate expression for the total electron density. DFT methods can be very accurate 
for little computational cost. Some methods combine the density functional exchange 
functional with the Hartree-Fock exchange term and are known as hybrid functional 
methods. 

Semi-empirical and empirical methods 

Semi-empirical quantum chemistry methods are based on the Hartree-Fock formalism, but 
make many approximations and obtain some parameters from empirical data. They are very 
important in computational chemistry for treating large molecules where the full 
Hartree-Fock method without the approximations is too expensive. The use of empirical 
parameters appears to allow some inclusion of correlation effects into the methods. 

Semi-empirical methods follow what are often called empirical methods, where the 
two-electron part of the Hamiltonian is not explicitly included. For n-electron systems, this 
was the Hiickel method proposed by Erich Hiickel, and for all valence electron systems, the 
Extended Hiickel method proposed by Roald Hoffmann. 

Molecular mechanics 

In many cases, large molecular systems can be modeled successfully while avoiding 
quantum mechanical calculations entirely. Molecular mechanics simulations, for example, 
use a single classical expression for the energy of a compound, for instance the harmonic 
oscillator. All constants appearing in the equations must be obtained beforehand from 
experimental data or ab initio calculations. 

The database of compounds used for parameterization, i.e., the resulting set of parameters 
and functions is called the force field, is crucial to the success of molecular mechanics 
calculations. A force field parameterized against a specific class of molecules, for instance 
proteins, would be expected to only have any relevance when describing other molecules of 
the same class. 

These methods can be applied to proteins and other large biological molecules, and allow 
studies of the approach and interaction (docking) of potential drug molecules (eg. [13] and 

[14]). 

Methods for solids 

Computational chemical methods can be applied to solid state physics problems. The 
electronic structure of a crystal is in general described by a band structure, which defines 
the energies of electron orbitals for each point in the Brillouin zone. Ab initio and 
semi-empirical calculations yield orbital energies; therefore, they can be applied to band 
structure calculations. Since it is time-consuming to calculate the energy for a molecule, it 
is even more time-consuming to calculate them for the entire list of points in the Brillouin 
zone. 
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Chemical dynamics 

Once the electronic and nuclear variables are separated (within the Born-Oppenheimer 
representation), in the time-dependent approach, the wave packet corresponding to the 
nuclear degrees of freedom is propagated via the time evolution operator (physics) 
associated to the time-dependent Schrodinger equation (for the full molecular 
Hamiltonian). In the complementary energy-dependent approach, the time-independent 
Schrodinger equation is solved using the scattering theory formalism. The potential 
representing the interatomic interaction is given by the potential energy surfaces. In 
general, the potential energy surfaces are coupled via the vibronic coupling terms. 

The most popular methods for propagating the wave packet associated to the molecular 
geometry are 

• the split operator technique, 

• the Multi-Configuration Time-Dependent Hartree method (MCTDH), 

• the semiclassical method. 

Molecular dynamics (MD) examines (using Newton's laws of motion) the time-dependent 
behavior of systems, including vibrations or Brownian motion, using a classical mechanical 
description. MD combined with density functional theory leads to the Car-Parrinello 
method. 

Interpreting molecular wave functions 

The Atoms in Molecules model developed by Richard Bader was developed in order to 
effectively link the quantum mechanical picture of a molecule, as an electronic 
wavefunction, to chemically useful older models such as the theory of Lewis pairs and the 
valence bond model. Bader has demonstrated that these empirically useful models are 
connected with the topology of the quantum charge density. This method improves on the 
use of Mulliken population analysis. 

Software packages 

There are many self-sufficient software packages used by computational chemists. Some 
include many methods covering a wide range, while others concentrating on a very specific 
range or even a single method. Details of most of them can be found in: 

• Quantum chemistry and solid state physics software supporting several methods. 

• Molecular mechanics programs. 

• Semi-empirical programs. 

• Valence Bond programs. 

• Biomolecular modelling programs: proteins, nucleic acid. 
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See also 

Mathematical chemistry 

Molecular modeling 

Molecular graphics 

Monte Carlo molecular modeling 

Quantum chemistry 

Basis set (chemistry) 

Molecular dynamics 

Bioinformatics 

Cheminformatics 

Computational Chemistry List 

Important publications in computational chemistry 

International Academy of Quantum Molecular Science 

Computational Science 

Statistical mechanics 

Molecule 

Force field in Chemistry 

Force field implementation 
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Mathematical chemistry 

Mathematical chemistry is the area of research engaged in the novel and nontrivial 
applications of mathematics to chemistry; it concerns itself principally with the 
mathematical modeling of chemical phenomena. Mathematical chemistry has also 
sometimes been called computer chemistry, but should not be confused with 
computational chemistry. 

Major areas of research in mathematical chemistry include chemical graph theory, which 
deals with topics such as the mathematical study of isomerism and the development of 
topological descriptors or indices which find application in quantitative structure-property 
relationships; chemical aspects of group theory, which finds applications in stereochemistry 
and quantum chemistry; and topological aspects of chemistry. 

The history of the approach may be traced back into 18th century. Georg Helm published a 
treatise titled "The Principles of Mathematical Chemistry: The Energetics of Chemical 

[on 

Phenomena" in 1894 . Some of the more contemporary periodical publications 
specializing in the field are MATCH Communications in Mathematical and in Computer 
Chemistry, first published in 1975, and the Journal of Mathematical Chemistry, first 
published in 1987. 

The basic models for mathematical chemistry are molecular graph and topological index. 

See also 

• Cheminformatics 

• Computational chemistry 

• Combinatorial chemistry 

• Molecular modeling 
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Monte Carlo method 



Monte Carlo methods are a class of computational algorithms that rely on repeated 
random sampling to compute their results. Monte Carlo methods are often used when 
simulating physical and mathematical systems. Because of their reliance on repeated 
computation and random or pseudo-random numbers, Monte Carlo methods are most 
suited to calculation by a computer. Monte Carlo methods tend to be used when it is 
unfeasible or impossible to compute an exact result with a deterministic algorithm. 

Monte Carlo simulation methods are especially useful in studying systems with a large 
number of coupled degrees of freedom, such as fluids, disordered materials, strongly 
coupled solids, and cellular structures (see cellular Potts model). More broadly, Monte 
Carlo methods are useful for modeling phenomena with significant uncertainty in inputs, 
such as the calculation of risk in business. These methods are also widely used in 
mathematics: a classic use is for the evaluation of definite integrals, particularly 
multidimensional integrals with complicated boundary conditions. It is a widely successful 
method in risk analysis when compared to alternative methods or human intuition. When 
Monte Carlo simulations have been applied in space exploration and oil exploration, actual 
observations of failures, cost overruns and schedule overruns are routinely better predicted 
by the simulations than by human intuition or alternative "soft" methods. 

The term "Monte Carlo method" was coined in the 1940s by physicists working on nuclear 
weapon projects in the Los Alamos National Laboratory. 
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Overview 

There is no single Monte Carlo method; instead, the 
term describes a large and widely-used class of 
approaches. However, these approaches tend to follow 
a particular pattern: 

1. Define a domain of possible inputs. 

2. Generate inputs randomly from the domain. 

3. Perform a deterministic computation using the 
inputs. 

4. Aggregate the results of the individual computations 
into the final result. 

For example, the value of n can be approximated using 
a Monte Carlo method: 

1 . Draw a square on the ground, then inscribe a circle 
within it. From plain geometry, the ratio of the area 
of an inscribed circle to that of the surrounding 
square is n/4. 

2. Uniformly scatter some objects of uniform size 
throughout the square. For example, grains of rice or 
sand. 

3. Since the two areas are in the ratio n/4, the objects 
should fall in the areas in approximately the same 
ratio. Thus, counting the number of objects in the 
circle and dividing by the total number of objects in 
the square will yield an approximation for n/4. 
Multiplying the result by 4 will then yield an 
approximation for n itself. 
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The Monte Carlo method can be 

illustrated as a game of battleship. 

First a player makes some random 

shots. Next the player applies 

algorithms (i.e. a battleship is four dots 

in the vertical or horizontal direction). 

Finally based on the outcome of the 

random sampling and the algorithm 

the player can determine the likely 

locations of the other player's ships. 



Notice how the n approximation follows the general 

pattern of Monte Carlo algorithms. First, we define a domain of inputs: in this case, it's the 
square which circumscribes our circle. Next, we generate inputs randomly (scatter 
individual grains within the square), then perform a computation on each input (test 
whether it falls within the circle). At the end, we aggregate the results into our final result, 
the approximation of n. Note, also, two other common properties of Monte Carlo methods: 
the computation's reliance on good random numbers, and its slow convergence to a better 
approximation as more data points are sampled. If grains are purposefully dropped into 
only, for example, the center of the circle, they will not be uniformly distributed, and so our 
approximation will be poor. An approximation will also be poor if only a few grains are 
randomly dropped into the whole square. Thus, the approximation of n will become more 
accurate both as the grains are dropped more uniformly and as more are dropped. 
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History 

The name "Monte Carlo" was popularized by physics researchers Stanislaw Ulam, Enrico 
Fermi, John von Neumann, and Nicholas Metropolis, among others; the name is a reference 
to the Monte Carlo Casino in Monaco where Ulam's uncle would borrow money to 
gamble. The use of randomness and the repetitive nature of the process are analogous to 
the activities conducted at a casino. 

Random methods of computation and experimentation (generally considered forms of 
stochastic simulation) can be arguably traced back to the earliest pioneers of probability 
theory (see, e.g., Buffon's needle, and the work on small samples by William Sealy Gosset), 
but are more specifically traced to the pre-electronic computing era. The general difference 
usually described about a Monte Carlo form of simulation is that it systematically "inverts" 
the typical mode of simulation, treating deterministic problems by first finding a 
probabilistic analog (see Simulated annealing). Previous methods of simulation and 
statistical sampling generally did the opposite: using simulation to test a previously 
understood deterministic problem. Though examples of an "inverted" approach do exist 
historically, they were not considered a general method until the popularity of the Monte 
Carlo method spread. 

Perhaps the most famous early use was by Enrico Fermi in 1930, when he used a random 
method to calculate the properties of the newly-discovered neutron. Monte Carlo methods 
were central to the simulations required for the Manhattan Project, though were severely 
limited by the computational tools at the time. Therefore, it was only after electronic 
computers were first built (from 1945 on) that Monte Carlo methods began to be studied in 
depth. In the 1950s they were used at Los Alamos for early work relating to the 
development of the hydrogen bomb, and became popularized in the fields of physics, 
physical chemistry, and operations research. The Rand Corporation and the U.S. Air Force 
were two of the major organizations responsible for funding and disseminating information 
on Monte Carlo methods during this time, and they began to find a wide application in 
many different fields. 

Uses of Monte Carlo methods require large amounts of random numbers, and it was their 
use that spurred the development of pseudorandom number generators, which were far 
quicker to use than the tables of random numbers which had been previously used for 
statistical sampling. 

Applications 

As mentioned, Monte Carlo simulation methods are especially useful for modeling 
phenomena with significant uncertainty in inputs and in studying systems with a large 
number of coupled degrees of freedom. Specific areas of application include: 

Physical sciences 

Monte Carlo methods are very important in computational physics, physical chemistry, and 
related applied fields, and have diverse applications from complicated quantum 
chromodynamics calculations to designing heat shields and aerodynamic forms. The Monte 
Carlo method is widely used in statistical physics, in particular, Monte Carlo molecular 
modeling as an alternative for computational molecular dynamics; see Monte Carlo method 
in statistical physics. In experimental particle physics, these methods are used for 
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designing detectors, understanding their behavior and comparing experimental data to 
theory. 

Monte Carlo methods are also used in the ensemble models that form the basis of modern 
weather forecasting operations. 

Design and visuals 

Monte Carlo methods have also proven efficient in solving coupled integral differential 
equations of radiation fields and energy transport, and thus these methods have been used 
in global illumination computations which produce photorealistic images of virtual 3D 
models, with applications in video games, architecture, design, computer generated films, 
special effects in cinema. 

Finance and business 

Monte Carlo methods in finance are often used to calculate the value of companies, to 
evaluate investments in projects at corporate level or to evaluate financial derivatives. The 
Monte Carlo method is intended for financial analysts who want to construct stochastic or 
probabilistic financial models as opposed to the traditional static and deterministic models. 
For its use in the insurance industry, see stochastic modelling. 

Telecommunications 

When planning a wireless network, design must be proved to work for a wide variety of 
scenarios that depend mainly on the number of users, their locations and the services they 
want to use. Monte Carlo methods are typically used to generate these users and their 
states. The network performance is then evaluated and, if results are not satisfactory, the 
network design goes through an optimization process. 

Games 

Monte Carlo methods have recently been applied in game playing related artificial 
intelligence theory. Most notably the game of Go has seen remarkably successful Monte 
Carlo algorithm based computer players. One of the main problems that this approach has 
in game playing is that it sometimes misses an isolated, very good move. These approaches 
are often strong strategically but weak tactically, as tactical decisions tend to rely on a 
small number of crucial moves which are easily missed by the randomly searching Monte 
Carlo algorithm. 

Monte Carlo simulation versus "what if" scenarios 

The opposite of Monte Carlo simulation might be considered deterministic modeling using 
single-point estimates. Each uncertain variable within a model is assigned a "best guess" 
estimate. Various combinations of each input variable are manually chosen (such as best 
case, worst case, and most likely case), and the results recorded for each so-called "what if" 
scenario. 

By contrast, Monte Carlo simulation considers random sampling of probability distribution 
functions as model inputs to produce hundreds or thousands of possible outcomes instead 



of a few discrete scenarios. The results provide probabilities of different outcomes 
occurring. For example, a comparison of a spreadsheet cost construction model run 
using traditional "what if" scenarios, and then run again with Monte Carlo simulation and 
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Triangular probability distributions shows that the Monte Carlo analysis has a narrower 
range than the "what if" analysis. This is because the "what if" analysis gives equal weight 
to all scenarios. ^ 

For an application, see quantifying uncertainty under corporate finance. 

Use in mathematics 

In general, Monte Carlo methods are used in mathematics to solve various problems by 
generating suitable random numbers and observing that fraction of the numbers obeying 
some property or properties. The method is useful for obtaining numerical solutions to 
problems which are too complicated to solve analytically. The most common application of 
the Monte Carlo method is Monte Carlo integration. 

Integration 

Deterministic methods of numerical integration operate by taking a number of evenly 
spaced samples from a function. In general, this works very well for functions of one 
variable. However, for functions of vectors, deterministic quadrature methods can be very 
inefficient. To numerically integrate a function of a two-dimensional vector, equally spaced 
grid points over a two-dimensional surface are required. For instance a 10x10 grid requires 
100 points. If the vector has 100 dimensions, the same spacing on the grid would require 
10 points— far too many to be computed. 100 dimensions is by no means unreasonable, 
since in many physical problems, a "dimension" is equivalent to a degree of freedom. (See 
Curse of dimensionality.) 

Monte Carlo methods provide a way out of this exponential time-increase. As long as the 
function in question is reasonably well-behaved, it can be estimated by randomly selecting 
points in 100-dimensional space, and taking some kind of average of the function values at 
these points. By the law of large numbers, this method will display 1/vJV 
convergence— i.e. quadrupling the number of sampled points will halve the error, 
regardless of the number of dimensions. 

A refinement of this method is to somehow make the points random, but more likely to 
come from regions of high contribution to the integral than from regions of low 
contribution. In other words, the points should be drawn from a distribution similar in form 
to the integrand. Understandably, doing this precisely is just as difficult as solving the 
integral in the first place, but there are approximate methods available: from simply making 
up an integrable function thought to be similar, to one of the adaptive routines discussed in 
the topics listed below. 

A similar approach involves using low-discrepancy sequences instead— the quasi-Monte 
Carlo method. Quasi-Monte Carlo methods can often be more efficient at numerical 
integration because the sequence "fills" the area better in a sense and samples more of the 
most important points that can make the simulation converge to the desired solution more 
quickly. 
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Integration methods 

• Direct sampling methods 

• Importance sampling 

• Stratified sampling 

• Recursive stratified sampling 

• VEGAS algorithm 

• Random walk Monte Carlo including Markov chains 

• Metropolis-Hastings algorithm 

• Gibbs sampling 

Optimization 

Another powerful and very popular application for random numbers in numerical simulation 
is in numerical optimization. These problems use functions of some often large-dimensional 
vector that are to be minimized (or maximized). Many problems can be phrased in this way: 
for example a computer chess program could be seen as trying to find the optimal set of, 
say, 10 moves which produces the best evaluation function at the end. The traveling 
salesman problem is another optimization problem. There are also applications to 
engineering design, such as multidisciplinary design optimization. 

Most Monte Carlo optimization methods are based on random walks. Essentially, the 
program will move around a marker in multi-dimensional space, tending to move in 
directions which lead to a lower function, but sometimes moving against the gradient. 

Optimization methods 

• Evolution strategy 

• Genetic algorithms 

• Parallel tempering 

• Simulated annealing 

• Stochastic optimization 

• Stochastic tunneling 

Inverse problems 

Probabilistic formulation of inverse problems leads to the definition of a probability 
distribution in the model space. This probability distribution combines a priori information 
with new information obtained by measuring some observable parameters (data). As, in the 
general case, the theory linking data with model parameters is nonlinear, the a posteriori 
probability in the model space may not be easy to describe (it may be multimodal, some 
moments may not be defined, etc.). 

When analyzing an inverse problem, obtaining a maximum likelihood model is usually not 
sufficient, as we normally also wish to have information on the resolution power of the data. 
In the general case we may have a large number of model parameters, and an inspection of 
the marginal probability densities of interest may be impractical, or even useless. But it is 
possible to pseudorandomly generate a large collection of models according to the posterior 
probability distribution and to analyze and display the models in such a way that 
information on the relative likelihoods of model properties is conveyed to the spectator. 
This can be accomplished by means of an efficient Monte Carlo method, even in cases 
where no explicit formula for the a priori distribution is available. 
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The best-known importance sampling method, the Metropolis algorithm, can be 
generalized, and this gives a method that allows analysis of (possibly highly nonlinear) 
inverse problems with complex a priori information and data with an arbitrary noise 
distribution. For details, see Mosegaard and Tarantola (1995), or Tarantola (2005). 

Computational mathematics 

Monte Carlo methods are useful in many areas of computational mathematics, where a 
lucky choice can find the correct result. A classic example is Rabin's algorithm for primality 
testing: for any n which is not prime, a random x has at least a 75% chance of proving that 
n is not prime. Hence, if n is not prime, but x says that it might be, we have observed at 
most a l-in-4 event. If 10 different random x say that "n is probably prime" when it is not, 
we have observed a one-in-a-million event. In general a Monte Carlo algorithm of this kind 
produces one correct answer with a guarantee n is composite, and x proves it so, but 
another one without, but with a guarantee of not getting this answer when it is wrong too 
often — in this case at most 25% of the time. See also Las Vegas algorithm for a related, 
but different, idea. 

Monte Carlo and random numbers 

Interestingly, Monte Carlo simulation methods do not always require truly random numbers 
to be useful — while for some applications, such as primality testing, unpredictability is 
vital (see Davenport (1995)). Many of the most useful techniques use deterministic, 
pseudo-random sequences, making it easy to test and re-run simulations. The only quality 
usually necessary to make good simulations is for the pseudo-random sequence to appear 
"random enough" in a certain sense. 

What this means depends on the application, but typically they should pass a series of 
statistical tests. Testing that the numbers are uniformly distributed or follow another 
desired distribution when a large enough number of elements of the sequence are 
considered is one of the simplest, and most common ones. 

See also 

General 

Auxiliary field Monte Carlo 
Bootstrapping (statistics) 
Demon algorithm 
Evolutionary Computation 
Las Vegas algorithm 
Markov chain 
Molecular dynamics 
Monte Carlo option model 
Monte Carlo integration 
Quasi-Monte Carlo method 
Random number generator 
Randomness 
Resampling (statistics) 
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Application areas 

• Graphics, particularly for ray tracing; a version of the Metropolis-Hastings algorithm is 
also used for ray tracing where it is known as Metropolis light transport 
Modeling light transport in biological tissue 
Monte Carlo methods in finance 
Reliability engineering 

In simulated annealing for protein structure prediction 

In semiconductor device research, to model the transport of current carriers 
Environmental science, dealing with contaminant behavior 

Search And Rescue and Counter-Pollution. Models used to predict the drift of a life raft 
or movement of an oil slick at sea. 

In probabilistic design for simulating and understanding the effects of variability 
In physical chemistry, particularly for simulations involving atomic clusters 
In biomolecular simulations 
In polymer physics 

• Bond fluctuation model 
In computer science 

• Las Vegas algorithm 

• LURCH 

• Computer go 

• General Game Playing 

Modeling the movement of impurity atoms (or ions) in plasmas in existing and tokamaks 
(e.g.: DIVIMP). 
Nuclear and particle physics codes using the Monte Carlo method: 

GEANT — CERN's simulation of high energy particles interacting with a detector. 
CompHEP, PYTHIA — Monte-Carlo generators of particle collisions 
MCNP(X) - LANL's radiation transport codes 

MCU: universal computer code for simulation of particle transport (neutrons, photons, 
electrons) in three-dimensional systems by means of the Monte Carlo method 
EGS — Stanford's simulation code for coupled transport of electrons and photons 
PEREGRINE: LLNL's Monte Carlo tool for radiation therapy dose calculations 
BEAMnrc — Monte Carlo code system for modeling radiotherapy sources (LINAC's) 
PENELOPE — Monte Carlo for coupled transport of photons and electrons, with 
applications in radiotherapy 

• MONK — Serco Assurance's code for the calculation of k-effective of nuclear systems 
Modelling of foam and cellular structures 
Modeling of tissue morphogenesis 
Computation of holograms 
Phylogenetic analysis, i.e. Bayesian inference, Markov chain Monte Carlo 
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Other methods employing Monte Carlo 

Assorted random models, e.g. self-organised criticality 

Direct simulation Monte Carlo 

Dynamic Monte Carlo method 

Kinetic Monte Carlo 

Quantum Monte Carlo 

Quasi-Monte Carlo method using low-discrepancy sequences and self avoiding walks 

Semiconductor charge transport and the like 

Electron microscopy beam-sample interactions 

Stochastic optimization 

Cellular Potts model 

Markov chain Monte Carlo 

Cross-entropy method 

Applied information economics 

Monte Carlo localization 

Notes 

[1] Douglas Hubbard "How to Measure Anything: Finding the Value of Intangibles in Business" pg. 46, John Wiley 

& Sons, 2007 
[2] Douglas Hubbard "The Failure of Risk Management: Why It's Broken and How to Fix It", John Wiley & Sons, 

2009 
[3] Nicholas Metropolis (1987), "http://library.lanl.gov/la-pubs/00326866.pdflThe beginning of the Monte Carlo 

method", Los Alamos Science (1987 Special Issue dedicated to Stanislaw Ulam): 125-130, http://library.lanl. 

gov/la-pubs/00326866.pdf 
[4] Douglas Hubbard "How to Measure Anything: Finding the Value of Intangibles in Business" pg. 46, John Wiley 

& Sons, 2007 
[5] David Vose: "Risk Analysis, A Quantitative Guide," Second Edition, p. 13, John Wiley & Sons, 2000. 
[6] Ibid, p. 16 

[7] Ibid, p. 17, showing graph 

[8] http://www.ipgp.jussieu.fr/~tarantola/Files/Professional/Papers_PDF/MonteCarlo_latex.pdf 
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External links 

Overview and reference list (http://mathworld.wolfram.com/MonteCarloMethod. 

html), Mathworld 

Introduction to Monte Carlo Methods (http://www.ipp.mpg.de/de/for/bereiche/ 

stellarator/Compsci/CompScience/csep/csepl.phy. ornl.gov/mc/mc. html), 

Computational Science Education Project 

Overview of formulas used in Monte Carlo simulation (http://www.sitmo.com/eqcat/ 

15), the Quant Equation Archive, at sitmo.com 

The Basics of Monte Carlo Simulations (http://www.chem.unl.edu/zeng/joy/mclab/ 

mcintro.html), University of Nebraska-Lincoln 

Introduction to Monte Carlo simulation (http://office.microsoft.com/en-us/assistance/ 

HA011118931033.aspx) (for Excel), Wayne L. Winston 

Monte Carlo Methods - Overview and Concept (http://www.brighton-webs.co.uk/ 

montecarlo/concept.asp), brighton-webs.co.uk 

Molecular Monte Carlo Intro (http://www.cooper.edu/engineering/chemechem/ 

monte.html), Cooper Union 

Monte Carlo techniques applied in physics (http://homepages.ed.ac.uk/s0095122/ 

Appletl-page.htm) 

MonteCarlo Simulation in Finance (http://www.global-derivatives.com/maths/k-o. 
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• Approximation of n with the Monte Carlo Method (http://twt.mpei.ac.ru/MAS/ 
Worksheets/approxpi.mcd) 

• Risk Analysis in Investment Appraisal (http://papers.ssrn.com/sol3/papers. 
cfm?abstract_id=265905), The Application of Monte Carlo Methodology in Project 
Appraisal, Sawakis C. Sawides 

• Probabilistic Assessment of Structures using the Monte Carlo method (http://en. 
wikiversity.org/wiki/Probabilistic_Assessment_of_Structures), Wikiuniversity paper for 
students of Structural Engineering 

• A very intuitive and comprehensive introduction to Quasi-Monte Carlo methods (http:// 
www.puc-rio.br/marco.ind/quasi_mc.html) 

• Pricing using Monte Carlo simulation (http://knol.google.eom/k/giancarlo-vercellino/ 
pricing-using-monte-carlo-simulation/lld5i2rgd9gn5/3#) / a practical example, Prof. 
Giancarlo Vercellino 

Software 

• The BUGS project (http://www.mrc-bsu.cam.ac.uk/bugs/) (including WinBUGS and 
OpenBUGS) 

• Monte Carlo Simulation, Resampling, Bootstrap tool (http://www.statisticsl01.net) 

• YASAI: Yet Another Simulation Add-In (http://yasai.rutgers.edu/) - Free Monte Carlo 
Simulation Add-In for Excel created by Rutgers University 
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Electronic structure methods 



Tight binding 
Nearly-free electron model 



Hartree-Fock 
Modern valence bond 



Generalized valence bond 
Moller-Plesset perturbation theory 



Configuration interaction 
Coupled cluster 



Multi-configurational self-consistent field 
Density functional theory 



Quantum chemistry composite methods 
Quantum Monte Carlo 



kp perturbation theory 
Muffin-tin approximation 



LCAO method 

Quantum Monte Carlo is a large class of computer algorithms that simulate quantum 
systems with the idea of solving the many-body problem. They use, in one way or another, 
the Monte Carlo method to handle the many-dimensional integrals that arise. Quantum 
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Monte Carlo allows a direct representation of many-body effects in the wavefunction, at the 
cost of statistical uncertainty that can be reduced with more simulation time. For bosons, 
there exist numerically exact and polynomial-scaling algorithms. For fermions, there exist 
very good approximations and numerically exact exponentially scaling quantum Monte 
Carlo algorithms, but none that are both. 

Background 

In principle, any physical system can be described by the many-body Schrodinger equation 
as long as the constituent particles are not moving "too" fast; that is, they are not moving 
near the speed of light. This includes the electrons in almost every material in the world, so 
if we could solve the Schrodinger equation, we could predict the behavior of any electronic 
system, which has important applications in fields from computers to biology. This also 
includes the nuclei in Bose-Einstein condensate and superfluids such as liquid helium. The 
difficulty is that the Schrodinger equation involves a function of three times the number of 
particles and is difficult to solve even using parallel computing technology in a reasonable 
amount of time (less than 2 years). Traditionally, theorists have approximated the 
many-body wave function as an antisymmetric function of one-body orbitals, as shown 
concisely at this link. This kind of formulation either limits the possible wave functions, as 
in the case of the Hartree-Fock (HF) approximation, or converges very slowly, as in 
configuration interaction. One of the reasons for the difficulty with an HF initial estimate 
(ground state seed, also known as Slater determinant) is that it is very difficult to model the 
electronic and nuclear cusps in the wavefunction. However, one does not generally model 
at this point of the approximation. As two particles approach each other, the wavefunction 
has exactly known derivatives. 

Quantum Monte Carlo is a way around these problems because it allows us to model a 
many-body wavefunction of our choice directly. Specifically, we can use a Hartree-Fock 
approximation as our starting point but then multiplying it by any symmetric function, of 
which Jastrow functions are typical, designed to enforce the cusp conditions. Most methods 
aim at computing the ground-state wavefunction of the system, with the exception of path 
integral Monte Carlo and finite-temperature auxiliary field Monte Carlo, which calculate the 
density matrix. 

There are several quantum Monte Carlo methods, each of which uses Monte Carlo in 
different ways to solve the many-body problem: 

Quantum Monte Carlo methods 

• Variational Monte Carlo : A good place to start; it is commonly used in many sorts of 
quantum problems. 

• Diffusion Monte Carlo : The most common high-accuracy method for electrons (that is, 
chemical problems), since it comes quite close to the exact ground-state energy fairly 
efficiently. Also used for simulating the quantum behavior of atoms, etc. 

• Path integral Monte Carlo : Finite-temperature technique mostly applied to bosons where 
temperature is very important, especially superfluid helium. 

• Auxiliary field Monte Carlo : Usually applied to lattice problems, although there has been 
recent work on applying it to electrons in chemical systems. 

• Reptation Monte Carlo : Recent zero-temperature method related to path integral Monte 
Carlo, with applications similar to diffusion Monte Carlo but with some different 
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tradeoffs. 

• Gaussian quantum Monte Carlo 

See also 

Stochastic Green Function (SGF) algorithm 

Monte Carlo method 

QMC@Home 

Quantum chemistry 

Density matrix renormalization group 

Time-evolving block decimation 

Metropolis algorithm 

Wavefunction optimization 
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Dynamics of Markovian particles 

Dynamics of Markovian particles (or DMP) is the basis of a theory for kinetics of 
particles in open heterogeneous systems. It can be looked upon as an application of the 
notion of stochastic process conceived as a physical entity; e.g. the particle moves because 
there is a transition probability acting on it. 

Two particular features of DMP might be noticed: (1) an ergodic like relation between the 
motion of particle and the corresponding steady state, and (2) the classic notion of 
geometric volume appears nowhere (e.g. a concept such as flow of "substance" is not 
expressed as liters per time unit but as number of particles per time unit). Though being 
primitive DMP has been applied for solving a classic paradox of the absorption of mercury 
by fish and by mollusks. The theory has also been applied for a purely probabilistic 
derivation of the fundamental physical principle: conservation of mass; this might be looked 
upon as a contribution to the old and ongoing discussion of the relation between physics 
and probability theory. 

Sources 

• Bergner— DMP, a kinetics of macroscopic particles in open heterogeneous systems 
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Molecular Networks and Complex 
Molecule Dynamics 

Metabolic network 

A metabolic network is the complete set of metabolic and physical processes that 
determine the physiological and biochemical properties of a cell. As such, these networks 
comprise the chemical reactions of metabolism as well as the regulatory interactions that 
guide these reactions. 

With the sequencing of complete genomes, it is now possible to reconstruct the network of 
biochemical reactions in many organisms, from bacteria to human. Several of these 
networks are available online: Kyoto Encyclopedia of Genes and Genomes (KEGG)[1], 
EcoCyc [2] and BioCyc [3]. Metabolic networks are powerful tools, for studying and 
modelling metabolism. From the study of metabolic networks' topology with graph theory to 
predictive toxicology and ADME. 

See also 

• Metabolic network modelling 

• Metabolic pathway 
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Topological dynamics 

In mathematics, topological dynamics is a branch of the theory of dynamical systems in 
which qualitative, asymptotic properties of dynamical systems are studied from the 
viewpoint of general topology. 

Scope 

The central object of study in topological dynamics is a topological dynamical system, 

i.e. a topological space, together with a continuous transformation, a continuous flow, or 
more generally, a semigroup of continuous transformations of that space. The origins of 
topological dynamics lie in the study of asymptotical properties of trajectories of systems of 
autonomous ordinary differential equations, in particular, the behavior of limit sets and 
various manifestations of "repetetiveness" of the motion, such as periodic trajectories, 
recurrence and minimality, stability, non-wandering points. George Birkhoff is considered 
to be the founder of the field. A structure theorem for minimal distal flows proved by Hillel 
Furstenberg in the early 1960s inspired much work on classification of minimal flows. A lot 
of research in the 1970s and 1980s was devoted to topological dynamics of one-dimensional 
maps, in particular, piecewise linear self-maps of the interval and the circle. 

Unlike the theory of smooth dynamical systems, where the main object of study is a smooth 
manifold with a diffeomorphism or a smooth flow, phase spaces considered in topological 
dynamics are general metric spaces (usually, compact). This necessitates development of 
entirely different techniques but allows extra degree of flexibility even in the smooth 
setting, because invariant subsets of a manifold are frequently very complicated 
topologically (cf limit cycle, strange attractor); additionally, shift spaces arising via 
symbolic representations can be considered on an equal footing with more geometric 
actions. Topological dynamics has intimate connections with ergodic theory of dynamical 
systems, and many fundamental concepts of the latter have topological analogues (cf 
Kolmogorov-Sinai entropy and topological entropy). 

See also 

• Poincare-Bendixson theorem 

• Symbolic dynamics 

• Topological conjugacy 
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Protein before and after folding. 



Protein folding is the physical 

process by which a polypeptide 

folds into its characteristic and 

functional three-dimensional 

structure from random coil. 1 J 

Each protein exists as an unfolded 

polypeptide or random coil when 

translated from a sequence of 

mRNA to a linear chain of amino 

acids. This polypeptide lacks any 

developed three-dimensional 

structure (the left hand side of the neighboring figure). However amino acids interact with 

each other to produce a well-defined three dimensional structure, the folded protein (the 

right hand side of the figure), known as the native state. The resulting three-dimensional 

structure is determined by the amino acid sequence. ^ . 

For many proteins the correct three dimensional structure is essential to function. Failure 
to fold into the intended shape usually produces inactive proteins with different properties 
including toxic prions. Several neurodegenerative and other diseases are believed to result 
from the accumulation of misfolded (incorrectly folded) proteins. 
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Known facts about the process 

The relationship between folding and amino acid sequence 



The amino-acid sequence (or 
primary structure) of a protein 
defines its native conformation. A 
protein molecule folds 

spontaneously during or after 
synthesis. While these 

macromolecules may be regarded 
as "folding themselves", the 
process also depends on the 
solvent (water or lipid bilayer)/ ' 
the concentration of salts, the 
temperature, and the presence of 
molecular chaperones. 

Folded proteins usually have a 
hydrophobic core in which side 
chain packing stabilizes the folded 
state, and charged or polar side 
chains occupy the solvent-exposed 
surface where they interact with 
surrounding water. Minimizing the 
number of hydrophobic side-chains 




Illustration of the main driving force behind protein structure 

formation. In the compact fold (to the right), the hydrophobic 

amino acids (shown as black spheres) are in general shielded 

from the solvent. 



[6] 



exposed to water is an important driving force behind the folding process, . Formation of 
intramolecular hydrogen bonds provides another important contribution to protein 
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stability. The strength of hydrogen bonds depends on their environment, thus H-bonds 
enveloped in a hydrophobic core contribute more than H-bonds exposed to the aqueous 
environment to the stability of the native state. 

The process of folding in vivo often begins co-translationally, so that the N-terminus of the 
protein begins to fold while the C-terminal portion of the protein is still being synthesized 
by the ribosome. Specialized proteins called chaperones assist in the folding of other 
proteins. A well studied example is the bacterial GroEL system, which assists in the 
folding of globular proteins. In eukaryotic organisms chaperones are known as heat shock 
proteins. Although most globular proteins are able to assume their native state unassisted, 
chaperone-assisted folding is often necessary in the crowded intracellular environment to 
prevent aggregation; chaperones are also used to prevent misfolding and aggregation 
which may occur as a consequence of exposure to heat or other changes in the cellular 
environment. 

For the most part, scientists have been able to study many identical molecules folding 
together en masse. At the coarsest level, it appears that in transitioning to the native state, 
a given amino acid sequence takes on roughly the same route and proceeds through 
roughly the same intermediates and transition states. Often folding involves first the 
establishment of regular secondary and supersecondary structures, particularly alpha 
helices and beta sheets, and afterwards tertiary structure. Formation of quaternary 
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structure usually involves the "assembly" or "coassembly" of subunits that have already 
folded. The regular alpha helix and beta sheet structures fold rapidly because they are 
stabilized by intramolecular hydrogen bonds, as was first characterized by Linus Pauling. 
Protein folding may involve covalent bonding in the form of disulfide bridges formed 
between two cysteine residues or the formation of metal clusters. Shortly before settling 
into their more energetically favourable native conformation, molecules may pass through 
an intermediate "molten globule" state. 

The essential fact of folding, however, remains that the amino acid sequence of each 
protein contains the information that specifies both the native structure and the pathway to 
attain that state. This is not to say that nearly identical amino acid sequences always fold 
similarly. Conformations differ based on environmental factors as well; similar proteins 
fold differently based on where they are found. Folding is a spontaneous process 
independent of energy inputs from nucleoside triphosphates. The passage of the folded 
state is mainly guided by hydrophobic interactions, formation of intramolecular hydrogen 
bonds, and van der Waals forces, and it is opposed by conformational entropy. 

Disruption of the native state 

Under some conditions proteins will not fold into their biochemically functional forms. 
Temperatures above or below the range that cells tend to live in will cause thermally 
unstable proteins to unfold or "denature" (this is why boiling makes an egg white turn 
opaque). High concentrations of solutes, extremes of pH, mechanical forces, and the 
presence of chemical denaturants can do the same. A fully denatured protein lacks both 
tertiary and secondary structure, and exists as a so-called random coil. Under certain 
conditions some proteins can refold; however, in many cases denaturation is 
irreversible. Cells sometimes protect their proteins against the denaturing influence of 
heat with enzymes known as chaperones or heat shock proteins, which assist other proteins 
both in folding and in remaining folded. Some proteins never fold in cells at all except with 
the assistance of chaperone molecules, which either isolate individual proteins so that their 
folding is not interrupted by interactions with other proteins or help to unfold misfolded 
proteins, giving them a second chance to refold properly. This function is crucial to prevent 
the risk of precipitation into insoluble amorphous aggregates. 

Incorrect protein folding and neurodegenerative disease 

Aggregated proteins are associated with prion-related illnesses such as Creutzfeldt-Jakob 
disease, bovine spongiform encephalopathy (mad cow disease), amyloid-related illnesses 
such as Alzheimer's Disease and familial amyloid cardiomyopathy or polyneuropathy, as 
well as intracytoplasmic aggregation diseases such as Huntington's and Parkinson's 
disease. These age onset degenerative diseases are associated with the multimerization of 
misfolded proteins into insoluble, extracellular aggregates and/or intracellular inclusions 
including cross-beta sheet amyloid fibrils; it is not clear whether the aggregates are the 
cause or merely a reflection of the loss of protein homeostasis, the balance between 
synthesis, folding, aggregation and protein turnover. Misfolding and excessive degradation 
instead of folding and function leads to a number of proteopathy diseases such as 
antitrypsin-associated Emphysema, cystic fibrosis and the lysosomal storage diseases, 
where loss of function is the origin of the disorder. While protein replacement therapy has 
historically been used to correct the latter disorders, an emerging approach is to use 
pharmaceutical chaperones to fold mutated proteins to render them functional. Chris 
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Dobson, Jeffery W. Kelly, Dennis Selkoe, Stanley Prusiner, Peter T. Lansbury, William E. 
Balch, Richard I. Morimoto, Susan L. Lindquist and Byron C. Caughey have all contributed 
to this emerging understanding of protein-misfolding diseases. 

Kinetics and the Levinthal Paradox 

The duration of the folding process varies dramatically depending on the protein of interest. 
When studied outside the cell, the slowest folding proteins require many minutes or hours 
to fold primarily due to proline isomerization, and must pass through a number of 
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intermediate states, like checkpoints, before the process is complete. On the other hand, 
very small single-domain proteins with lengths of up to a hundred amino acids typically fold 
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in a single step. Time scales of milliseconds are the norm and the very fastest known 
protein folding reactions are complete within a few microseconds. 

The Levinthal paradox observes that if a protein were to fold by sequentially sampling all 
possible conformations, it would take an astronomical amount of time to do so, even if the 
conformations were sampled at a rapid rate (on the nanosecond or picosecond scale). Based 
upon the observation that proteins fold much faster than this, Levinthal then proposed that 
a random conformational search does not occur, and the protein must, therefore, fold 
through a series of meta-stable intermediate states. 

Techniques for studying protein folding 

Circular Dichroism 

Circular dichroism is one of the most general and basic tools to study protein folding. 
Circular dichroism spectroscopy measures the absorption of circularly polarized light. In 
proteins, structures such as alpha helicies and beta sheets are chiral, and thus absorb such 
light. The absorption of this light acts as a marker of the degree of foldedness of the protein 
ensemble. This technique can be used to measure equilibrium unfolding of the protein by 
measuring the change in this absorption as a function of denaturant concentration or 
temperature. A denaturant melt measures the free energy of unfolding as well as the 
protein's m value, or denaturant dependence. A temperature melt measures the melting 
temperature (T ) of the protein. This type of spectroscopy can also be combined with 
fast-mixing devices, such as stopped flow, to measure protein folding kinetics and to 
generate chevron plots. 

Dual Polarisation Interferometry 

Dual Polarisation Interferometry is a relatively new benchtop technique for measuring the 
overall change in protein size and fold density during interactions or other stimulus. The 
technique captures a layer of protein on a glass slide and, using two polarisations of light, 
measures the conformation and conformational changes with a time resolution of circa 
10Hz at a dimensional resolution of O.Olnm. The method is quantitative and can be 
compared directly to what one would expect of crystallography data. 
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Vibrational circular dichroism of proteins 

The more recent developments of vibrational circular dichroism (VCD) techniques for 
proteins, currently involving Fourier transform (FFT) instruments, provide powerful means 
for determining protein conformations in solution even for very large protein molecules. 
Such VCD studies of proteins are often combined with X-ray diffraction of protein crystals, 
FT-IR data for protein solutions in heavy water (DO), or ab initio quantum computations to 
provide unambiguous structural assignments that are unobtainable from CD. 

Modern studies of folding with high time resolution 

The study of protein folding has been greatly advanced in recent years by the development 
of fast, time-resolved techniques. These are experimental methods for rapidly triggering the 
folding of a sample of unfolded protein, and then observing the resulting dynamics. Fast 
techniques in widespread use include neutron scattering , ultrafast mixing of solutions, 
photochemical methods, and laser temperature jump spectroscopy. Among the many 
scientists who have contributed to the development of these techniques are Jeremy Cook, 
Heinrich Roder, Harry Gray, Martin Gruebele, Brian Dyer, William Eaton, Sheena Radford, 
Chris Dobson, Sir Alan R. Fersht and Bengt Nolting. 

Energy landscape theory of protein folding 

The protein folding phenomenon was largely an experimental endeavor until the 
formulation of energy landscape theory by Joseph Bryngelson and Peter Wolynes in the late 
1980s and early 1990s. This approach introduced the principle of minimal frustration, 
which asserts that evolution has selected the amino acid sequences of natural proteins so 
that interactions between side chains largely favor the molecule's acquisition of the folded 
state. Interactions that do not favor folding are selected against, although some residual 
frustration is expected to exist. A consequence of these evolutionarily selected sequences is 
that proteins are generally thought to have globally 'Tunneled energy landscapes" (coined 
by Jose Onuchic[reference needed]) that are largely directed towards the native state. This 
"folding funnel" landscape allows the protein to fold to the native state through any of a 
large number of pathways and intermediates, rather than being restricted to a single 
mechanism. The theory is supported by both computational simulations of model proteins 
and numerous experimental studies, and it has been used to improve methods for protein 
structure prediction and design. 

Computational prediction of protein tertiary structure 

De novo or ab initio techniques for computational protein structure prediction is related to, 
but strictly distinct from, studies involving protein folding. Molecular Dynamics (MD) is an 
important tool for studying protein folding and dynamics in silico. Because of computational 
cost, ab initio MD folding simulations with explicit water are limited to peptides and very 
small proteins. MD simulations of larger proteins remain restricted to dynamics of the 
experimental structure or its high-temperature unfolding. In order to simulate long time 
folding processes (beyond about 1 microsecond), like folding of small-size proteins (about 
50 residues) or larger, some approximations or simplifications in protein models need to be 
introduced. An approach using reduced protein representation (pseudo-atoms representing 

groups of atoms are defined) and statistical potential is not only useful in protein structure 
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prediction, but is also capable of reproducing the folding pathways. 
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There are distributed computing projects which use idle CPU time of personal computers to 
solve problems such as protein folding or prediction of protein structure. People can run 
these programs on their computer or PlayStation 3 to support them. See links below (for 
example Folding@Home) to get information about how to participate in these projects. 

Experimental techniques of protein structure determination 

Folded structures of proteins are routinely determined by X-ray crystallography and NMR. 

See also 

Anfinsen's dogma 

Chevron plot 

Denaturation (biochemistry) 

Denaturation midpoint 

Downhill folding 

Equilibrium unfolding 

Folding (chemistry) 

Folding@Home 

Foldit computer game 

Levinthal paradox 

Protein design 

Protein dynamics 

Protein structure prediction 

Protein structure prediction software 

Rosetta@Home 

Software for molecular mechanics modeling 
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Protein-protein interaction 

Protein-protein interactions involve not only the direct-contact association of protein 
molecules but also longer range interactions through the electrolyte, aqueous solution 
medium surrounding neighbor hydrated proteins over distances from less than one 

nanometer to distances of several tens of nanometers. Furthermore, such protein-protein 

rn 
interactions are thermodynamically linked functions of dynamically bound ions and water 

that exchange rapidly with the surrounding solution by comparison with the molecular 

tumbling rate (or correlation times) of the interacting proteins. Protein associations are also 

studied from the perspectives of biochemistry, quantum chemistry, molecular dynamics, 

signal transduction and other metabolic or genetic/epigenetic networks. Indeed, 

protein-protein interactions are at the core of the entire Interactomics system of any living 

cell. 

The interactions between proteins are important for very numerous— if not all— biological 
functions. For example, signals from the exterior of a cell are mediated to the inside of that 
cell by protein-protein interactions of the signaling molecules. This process, called signal 
transduction, plays a fundamental role in many biological processes and in many diseases 
(e.g. cancers). Proteins might interact for a long time to form part of a protein complex, a 
protein may be carrying another protein (for example, from cytoplasm to nucleus or vice 
versa in the case of the nuclear pore importins), or a protein may interact briefly with 
another protein just to modify it (for example, a protein kinase will add a phosphate to a 
target protein). This modification of proteins can itself change protein-protein interactions. 
For example, some proteins with SH2 domains only bind to other proteins when they are 
phosphorylated on the amino acid tyrosine while bromodomains specifically recognise 
acetylated lysines. In conclusion, protein-protein interactions are of central importance for 
virtually every process in a living cell. Information about these interactions improves our 
understanding of diseases and can provide the basis for new therapeutic approaches. 

Methods to investigate protein-protein interactions 

Biochemical methods 

As protein-protein interactions are so important there are a multitude of methods to detect 
them. Each of the approaches has its own strengths and weaknesses, especially with regard 
to the sensitivity and specificity of the method. A high sensitivity means that many of the 
interactions that occur in reality are detected by the screen. A high specificity indicates 
that most of the interactions detected by the screen are also occurring in reality. 

• Co-immunoprecipitation is considered to be the gold standard assay for protein-protein 
interactions, especially when it is performed with endogenous (not overexpressed and 
not tagged) proteins. The protein of interest is isolated with a specific antibody. 
Interaction partners which stick to this protein are subsequently identified by western 
blotting. Interactions detected by this approach are considered to be real. However, this 
method can only verify interactions between suspected interaction partners. Thus, it is 
not a screening approach. A note of caution also is that immunoprecipitation experiments 
reveal direct and indirect interactions. Thus, positive results may indicate that two 
proteins interact directly or may interact via a bridging protein. 
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• Bimolecular Fluorescence Complementation (BiFC) is a new technique in observing the 
interactions of proteins. Combining with other new techniques, this method can be used 
to screen protein-protein interactions and their modulators . 

• Affinity electrophoresis as used for estimation of binding constants, as for instance in 
lectin affinity electrophoresis or characterization of molecules with specific features like 
glycan content or ligand binding. 

• Pull-down assays are a common variation of immunoprecipitation and 
immunoelectrophoresis and are used identically, although this approach is more 
amenable to an initial screen for interacting proteins. 

• Label transfer can be used for screening or confirmation of protein interactions and can 
provide information about the interface where the interaction takes place. Label transfer 
can also detect weak or transient interactions that are difficult to capture using other in 
vitro detection strategies. In a label transfer reaction, a known protein is tagged with a 
detectable label. The label is then passed to an interacting protein, which can then be 
identified by the presence of the label. 

• The yeast two-hybrid screen investigates the interaction between artificial fusion 
proteins inside the nucleus of yeast. This approach can identify binding partners of a 
protein in an unbiased manner. However, the method has a notorious high false-positive 
rate which makes it necessary to verify the identified interactions by 
co-immunoprecipitation. 

• In-vivo crosslinking of protein complexes using photo-reactive amino acid analogs was 
introduced in 2005 by researchers from the Max Planck Institute In this method, cells 
are grown with photoreactive diazirine analogs to leucine and methionine, which are 
incorporated into proteins. Upon exposure to ultraviolet light, the diazirines are activated 
and bind to interacting proteins that are within a few angstroms of the photo-reactive 
amino acid analog. 

• Tandem affinity purification (TAP) method allows high throughput identification of 
protein interactions. In contrast to Y2H approach accuracy of the method can be 
compared to those of small-scale experiments (Collins et al., 2007) and the interactions 
are detected within the correct cellular environment as by co-immunoprecipitation. 
However, the TAP tag method requires two successive steps of protein purification and 
consequently it can not readily detect transient protein-protein interactions. Recent 
genome-wide TAP experiments were performed by Krogan et al., 2006 and Gavin et al., 
2006 providing updated protein interaction data for yeast organism. 

• Chemical crosslinking is often used to "fix" protein interactions in place before trying to 
isolate/identify interacting proteins. Common crosslinkers for this application include the 
non-cleavable NHS-ester crosslinker, bz's-sulfosuccinimidyl suberate (BS3); a cleavable 
version of BS3, dithiobis(sulfosuccinimidyl propionate) (DTSSP); and the imidoester 
crosslinker dimethyl dithiobispropionimidate (DTBP) that is popular for fixing 
interactions in ChIP assays. 

• Chemical crosslinking followed by high mass MALDI mass spectrometry can be used to 
analyze intact protein interactions in place before trying to isolate/identify interacting 
proteins. This method detects interactions among non-tagged proteins and is available 
from CovalX. 

• SPINE (Strep-protein interaction experiment) uses a combination of reversible 
crosslinking with formaldehyde and an incorporation of an affinity tag to detect 
interaction partners in vivo. 
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• Quantitative immunoprecipitation combined with knock-down (QUICK) relies on 
co-immunoprecipitation, quantitative mass spectrometry (SILAC) and RNA interference 
(RNAi). This method detects interactions among endogenous non-tagged proteins . 
Thus, it has the same high confidence as co-immunoprecipitation. However, this method 
also depends on the availability of suitable antibodies. 

Physical/Biophysical and Theoretical methods 

• Dual Polarisation Interferometry (DPI) can be used to measure protein-protein 
interactions. DPI provides real-time, high-resolution measurements of molecular size, 
density and mass. While tagging is not necessary, one of the protein species must be 
immobilized on the surface of a waveguide. 

• Static Light scattering (SLS) measures changes in the Rayleigh scattering of protein 
complexes in solution and can non-destructively characterize both weak and strong 
interactions without tagging or immobilization of the protein. The measurement consists 
of mixing a series of aliquots of different concentrations or compositions with the anylate, 
measuring the effect of the changes in light scattering as a result of the interaction, and 
fitting the correlated light scattering changes with concentration to a model. Weak, 
non-specific interactions are typically characterized via the second virial coefficient. This 
type of analysis can determine the equilibrium association constant for associated 
complexes. . Additional light scattering methods for protein activity determination 
were previously developed by Timasheff. More recent Dynamic Light scattering (DLS) 
methods for proteins were reported by H. Chou that are also applicable at high protein 
concentrations and in protein gels; DLS may thus also be applicable for in vivo 
cytoplasmic observations of various protein-protein interactions. 

• Surface plasmon resonance can be used to measure protein-protein interaction. 

• With Fluorescence correlation spectroscopy, one protein is labeled with a fluorescent dye 
and the other is left unlabeled. The two proteins are then mixed and the data outputs the 
fraction of the labeled protein that is unbound and bound to the other protein, allowing 
you to get a measure of K and binding affinity. You can also take time-course 
measurements to characterize binding kinetics. FCS also tells you the size of the formed 
complexes so you can measure the stoichiometry of binding. A more powerful methods is 
[[fluorescence cross-correlation spectroscopy (FCCS) that employs double labeling 
techniques and cross-correlation resulting in vastly improved signal-to-noise ratios over 
FCS. Furthermore, the two-photon and three-photon excitation practically eliminates 
photobleaching effects and provide ultra-fast recording of FCCS or FCS data. 

• Fluorescence resonance energy transfer (FRET) is a common technique when observing 
the interactions of only two different proteins . 

• Protein activity determination by NMR multi-nuclear relaxation measurements, or 2D-FT 
NMR spectroscopy in solutions, combined with nonlinear regression analysis of NMR 
relaxation or 2D-FT spectroscopy data sets. Whereas the concept of water activity is 
widely known and utilized in the applied biosciences, its complement-the protein activity 
which quantitates protein-protein interactions- is much less familiar to bioscientists as it 
is more difficult to determine in dilute solutions of proteins; protein activity is also much 
harder to determine for concentrated protein solutions when protein aggregation, not 
merely transient protein association, is often the dominant process . 

• Theoretical modeling of protein-protein interactions involves a detailed physical 
chemistry/thermodynamic understanding of several effects involved, such as 
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intermolecular forces, ion-binding, proton fluctuations and proton exchange. The theory 
of thermodynamically linked functions is one such example in which ion-binding and 
protein-protein interactions are treated as linked processes; this treatment is especially 
important for proteins that have enzymatic activity which depends on cofactor ions 
dynamically bound at the enzyme active site, as for example, in the case of 
oxygen-evolving enzyme system (OES) in photosythetic biosystems where the oxygen 
molecule binding is linked to the chloride anion binding as well as the linked state 
transition of the manganese ions present at the active site in Photosystem II(PSII). 
Another example of thermodynamically linked functions of ions and protein activity is 
that of divalent calcium and magnesium cations to myosin in mechanical energy 
transduction in muscle. Last-but-not least, chloride ion and oxygen binding to hemoglobin 
(from several mammalian sources, including human) is a very well-known example of 
such thermodynamically linked functions for which a detailed and precise theory has 
been already developed. 

• Molecular dynamics (MD) computations of protein-protein interactions. 

• Protein-protein docking, the prediction of protein-protein interactions based only on the 
three-dimensional protein structures from X-ray diffraction of protein crystals might not 
be satisfactory. [9] [10] 

Network visualization of protein-protein interactions 

Visualization of protein-protein interaction networks is a popular application of scientific 
visualization techniques. Although protein interaction diagrams are common in textbooks, 
diagrams of whole cell protein interaction networks were not as common since the level of 
complexity made them difficult to generate. One example of a manually produced molecular 
interaction map is Kurt Kohn's 1999 map of cell cycle control. Drawing on Kohn's map, 
in 2000 Schwikowski, Uetz, and Fields published a paper on protein-protein interactions in 
yeast, linking together 1,548 interacting proteins determined by two-hybrid testing. They 
used a force-directed (Sugiyama) graph drawing algorithm to automatically generate an 
image of their network. [12] [13] [14] . 

An experimental view of Kurt Kohn's 1999 map gmap . Image was merged via gimp 
2.2.17 and then uploaded to maplib.net 

See also 

Interactomics 

Signal transduction 

Biophysical techniques 

Biochemistry methods 

Genomics 

Complex systems biology 

Complex systems 

Immunoprecipitation 

Protein-protein interaction prediction 

Protein-protein interaction screening 

BioGRID, a public repository for protein and genetic interactions 

Database of Interacting Proteins (DIP) 

NCIBI National Center for Integrative Biomedical Informatics 
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• Biotechnology 

• Protein nuclear magnetic resonance spectroscopy 

• 2D-FT NMRI and Spectroscopy 

• Fluorescence correlation spectroscopy 

• Fluorescence cross-correlation spectroscopy 

• Light scattering 

• ConsensusPathDB 
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External links 

• National Center for Integrative Biomedical Informatics (NCIBI) (http://portal.ncibi.org/ 
gateway/) 

• Proteins and Enzymes (http://www.dmoz.org/Science/Biology/ 
BiochemistryandMolecularBiology/Biomolecules/ProteinsandEnzymes/) at the 
Open Directory Project 

• FLIM Applications (http://www.nikoninstruments.com/infocenter.php?n=FLIM) FLIM 
is also often used in microspectroscopic/ chemical imaging, or microscopic, studies to 
monitor spatial and temporal protein-protein interactions, properties of membranes and 
interactions with nucleic acids in living cells. 

• Arabidopsis thaliana protein interaction network (http://bioinfo.esalq.usp.br/atpin) 



DNA Dynamics 



DNA Molecular dynamics modeling involves simulations of DNA molecular geometry 
and topology changes with time as a result of both intra- and inter- molecular interactions 
of DNA. Whereas molecular models of Deoxyribonucleic acid (DNA) molecules such as 
closely packed spheres (CPK models) made of plastic or metal wires for 'skeletal models' 
are useful representations of static DNA structures, their usefulness is very limited for 
representing complex DNA dynamics. Computer molecular modeling allows both 
animations and molecular dynamics simulations that are very important for understanding 
how DNA functions in vivo. 

An old standing dynamic problem is how DNA "self-replication" takes place in living cells 
that should involve transient uncoiling of supercoiled DNA fibers. Although DNA consists of 
relatively rigid, very large elongated biopolymer molecules called "fibers" or chains its 
molecular structure in vivo undergoes dynamic configuration changes that involve 
dynamically attached water molecules, ions or proteins/enzymes. Supercoiling, packing 
with histones in chromosome structures, and other such supramolecular aspects also 
involve in vivo DNA topology which is even more complex than DNA molecular geometry, 
thus turning molecular modeling of DNA dynamics into a series of challenging problems for 
biophysical chemists, molecular biologists and biotechnologists. Thus, DNA exists in 
multiple stable geometries (called conformational isomerism) and has a rather large 
number of configurational, quantum states which are close to each other in energy on the 
potential energy surface of the DNA molecule. 

Such varying molecular geometries can also be computed, at least in principle, by 
employing ab initio quantum chemistry methods that can attain high accuracy for small 
molecules, although claims that acceptable accuracy can be also achieved for 
polynucleotides, as well as DNA conformations, were recently made on the basis of VCD 
spectral data. Such quantum geometries define an important class of ab initio molecular 
models of DNA whose exploration has barely started especially in connection with results 
obtained by VCD in solutions. More detailed comparisons with such ab initio quantum 
computations are in principle obtainable through 2D-FT NMR spectroscopy and relaxation 
studies of polynucleotide solutions or specifically labeled DNA, as for example with 
deuterium labels. 
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Importance of DNA molecular structure and dynamics 
modeling for Genomics and beyond 

From the very early stages of structural studies of DNA by X-ray diffraction and 
biochemical means, molecular models such as the Watson-Crick double-helix model were 
successfully employed to solve the 'puzzle' of DNA structure, and also find how the latter 
relates to its key functions in living cells. The first high quality X-ray diffraction patterns of 
A-DNA were reported by Rosalind Franklin and Raymond Gosling in 1953 . The first 
reports of a double-helix molecular model of B-DNA structure were made by Watson and 
Crick in 1953 [2] [3] . Then Maurice F. Wilkins, A. Stokes and H.R. Wilson, reported the first 
X-ray patterns of in vivo B-DNA in partially oriented salmon sperm heads . The 

development of the first correct double-helix molecular model of DNA by Crick and Watson 
may not have been possible without the biochemical evidence for the nucleotide 
base-pairing ([A— T]; [C— G]), or Chargaff's rules [5] [6] [7] [8] [9] [10] . Although such initial 
studies of DNA structures with the help of molecular models were essentially static, their 
consequences for explaining the in vivo functions of DNA were significant in the areas of 
protein biosynthesis and the quasi-universality of the genetic code. Epigenetic 
transformation studies of DNA in vivo were however much slower to develop in spite of 
their importance for embryology, morphogenesis and cancer research. Such chemical 
dynamics and biochemical reactions of DNA are much more complex than the molecular 
dynamics of DNA physical interactions with water, ions and proteins/enzymes in living cells. 

Animated DNA molecular models and hydrogen-bonding 

Animated molecular models allow one to visually explore the three-dimensional (3D) 
structure of DNA. The first DNA model is a space-filling, or CPK, model of the DNA 
double-helix whereas the third is an animated wire, or skeletal type, molecular model of 

DNA. The last two DNA molecular models in this series depict quadruplex DNA that 

rm ri2i 
may be involved in certain cancers . The first CPK model in the second row is a 

molecular model of hydrogen bonds between water molecules in ice that are broadly similar 

to those found in DNA; the hydrogen bonding dynamics and proton exchange is however 

very different by many orders of magnitude between the two systems of fully hydrated DNA 

and water molecules in ice. Thus, the DNA dynamics is complex, involving nanosecond and 

several tens of picosecond time scales, whereas that of liquid ice is on the picosecond time 

scale, and that of proton exchange in ice is on the millisecond time scale; the proton 

exchange rates in DNA and attached proteins may vary from picosecond to nanosecond, 

minutes or years, depending on the exact locations of the exchanged protons in the large 

biopolymers. The simple harmonic oscillator 'vibration' in the third, animated image of the 

next gallery is only an oversimplified dynamic representation of the longitudinal vibrations 

of the DNA intertwined helices which were found to be anharmonic rather than harmonic as 

often assumed in quantum dynamic simulations of DNA. 
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Human Genomics and Biotechnology Applications of DNA 
Molecular Modeling 

The following two galleries of images illustrate various uses of DNA molecular modeling in 
Genomics and Biotechnology research applications from DNA repair to PCR and DNA 
nanostructures; each slide contains its own explanation and/or details. The first slide 
presents an overview of DNA applications, including DNA molecular models, with emphasis 
on Genomics and Biotechnology. 

Applications of DNA molecular dynamics computations 

• First row images present a DNA biochip and DNA nanostructures designed for DNA 
computing and other dynamic applications of DNA nanotechnology; last image in this row 
is of DNA arrays that display a representation of the Sierpinski gasket on their surfaces. 

• Second row. the first two images show computer molecular models of RNA polymerase, 
followed by that of an E. coli, bacterial DNA primase template suggesting very complex 
dynamics at the interfaces between the enzymes and the DNA template; the fourth image 
illustrates in a computed molecular model the mutagenic, chemical interaction of a 
potent carcinogen molecule with DNA, and the last image shows the different 
interactions of specific fluorescence labels with DNA in human and orangoutan 
chromosomes. 




LI ' ■ ■"" ■"ll—iT' -«^? 


Jk, 1 






££■ 




*»«»--«<ij' F "~- 




^^^ 



P ei.lijrieriu; I'crunJal 






DNA Dynamics 



193 



' 


V; 


1 <&■_. 

i f ~l r- ' 


* 




*& ^ 



Image Gallery: DNA Applications and Technologies at various scales 
in Biotechnology and Genomics research 

The first figure is an actual electron micrograph of a DNA fiber bundle, presumably of a 
single plasmid, bacterial DNA loop. 
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Databases for Genomics, DNA Dynamics and Sequencing 



Genomic and structural databases 

• CBS Genome Atlas Database — contains examples of base skews. 

• The Z curve database of genomes — a 3-dimensional visualization and analysis tool of 
genomes [59][14] . 

• DNA and other nucleic acids' molecular models: Coordinate files of nucleic acids 
molecular structure models in PDB and CIF formats 
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DNA Dynamics Data from Spectroscopy 

• FT-NMR [15] [16] 

• NMR Atlas-database [29] 

• mmcif downloadable coordinate files of nucleic acids in solution from 2D-FT NMR data 

[30] 

• NMR constraints files for NAs in PDB format [31] 
NMR microscopy'- ^ 
Vibrational circular dichroism (VCD) 
Microwave spectroscopy 
FT-IR 

FT-NIR [18] [19] [20] 

Spectral Hyperspectral, and Chemical imaging) [21] [22] [23] [24] [25] [26] [27] . 
Raman spectroscopy/microscopy and CARS 

Fluorescence correlation spectroscopy' 301 [31] [32] [33] [34] [35] [36] [37] , Fluorescence 
cross-correlation spectroscopy and FRET 



Confocal microscopy 



[41] 
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Gallery: CARS (Raman spectroscopy), Fluorescence confocal 
microscopy, and Hyperspectral imaging 
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X-ray microscopy 

• Application of X-ray microscopy in the analysis of living hydrated cells 



[18] 



Atomic Force Microscopy (AFM) 

Two-dimensional DNA junction arrays have been visualized by Atomic Force Microscopy 
(AFM) [ ] . Other imaging resources for AFM/Scanning probe microscopy(SPM) can be 
freely accessed at: 

• How SPM Works [25] 

• SPM Image Gallery - AFM STM SEM MFM NSOM and more. [26] 

Gallery of AFM Images of DNA Nanostructures 
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External links 

• DNAlive: a web interface to compute DNA physical properties (http://mmb.pcb.ub.es/ 
DNAlive). Also allows cross-linking of the results with the UCSC Genome browser and 
DNA dynamics. 

• Application of X-ray microscopy in analysis of living hydrated cells (http://www.ncbi. 
nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract& 
list_uids=12379938) 

• DiProDB: Dinucleotide Property Database (http://diprodb.fli-leibniz.de). The database 
is designed to collect and analyse thermodynamic, structural and other dinucleotide 
properties. 

• DNA the Double Helix Game (http://nobelprize.org/educational_games/medicine/ 
dnadoublehelix/) From the official Nobel Prize web site 

• MDDNA: Structural Bioinformatics of DNA (http://humphry.chem. wesleyan.edu:8080/ 
MDDNA/) 

• Double Helix 1953-2003 (http://www.ncbe.reading.ac.uk/DNA50/) National Centre 
for Biotechnology Education 

• DNA under electron microscope (http://www.fidelitysystems.com/Unlinked_DNA. 
html) 

• Further details of mathematical and molecular analysis of DNA structure based on X-ray 
data (http://planetphysics.org/encyclopedia/ 
BesselFunctionsApplicationsToDiffractionByHelicalStructures.html) 

• Bessel functions corresponding to Fourier transforms of atomic or molecular helices. 
(http://planetphysics.org/?op=getobj&from=objects& 
name=BesselFunctionsAndTheirApplicationsToDiffractionByHelicalStructures) 

• Characterization in nanotechnology some pdfs (http://nanocharacterization.sitesled. 
com/) 

• An overview of STM/AFM/SNOM principles with educative videos (http://www.ntmdt. 
ru/SPM-Techniques/Principles/) 

• SPM Image Gallery - AFM STM SEM MFM NSOM and More (http://www.rhk-tech.com/ 
results/showcase. php) 

• How SPM Works (http://www.parkafm.com/New_html/resources/01general.php) 
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• U.S. National DNA Day (http://www.genome.gov/10506367) — watch videos and 
participate in real-time discussions with scientists. 

• The Secret Life of DNA - DNA Music compositions (http://www.tjmitchell.com/stuart/ 
dna.html) 

• Ascalaph DNA (http://www.agilemolecule.com/Ascalaph/Ascalaph_DNA.html) — 
Commercial software for DNA modeling 
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Part of a series of articles on 

Molecular self-assembly 



Self-assembled monolayer 

Supramolecular assembly 

DNA nanotechnology 



See also 
Nanotechnology 

DNA nanotechnology is a subfield of nanotechnology which seeks to use the unique 
molecular recognition properties of DNA and other nucleic acids to create novel, 
controllable structures out of DNA. The DNA is thus used as a structural material rather 
than as a carrier of genetic information, making it an example of bionanotechnology. This 
has possible applications in molecular self-assembly and in DNA computing. 

Introduction: DNA crossover molecules 





Structure of the 4-arm junction. 

Left: A schematic. Right: A more realistic model. 

Each of the four separate DNA single strands are shown in different colors. 
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DNA nanotechnology makes use of branched DNA structures to 
create DNA complexes with useful properties. DNA is normally a 
linear molecule, in that its axis is unbranched. However, DNA 
molecules containing junctions can also be made. For example, a 
four-arm junction can be made using four individual DNA strands 
which are complementary to each other in the correct pattern. Due to 
Watson-Crick base pairing, only portions of the strands which are 
complementary to each other will attach to each other to form duplex 
DNA. This four-arm junction is an immobile form of a Holliday 
junction. 

Junctions can be used in more complex molecules. The most 
important of these is the "double-crossover" or DX motif. Here, two 
DNA duplexes lie next to each other, and share two junction points 
where strands cross from one duplex into the other. This molecule 
has the advantage that the junction points are now constrained to a 
single orientation as opposed to being flexible as in the four-arm 
junction. This makes the DX motif suitible as a structural building 



block for larger DNA complexes 



[3] 




A double-crossover 
(DX) molecule. This 
molecule consists of 

five DNA single 

strands which form 

two double-helical 

domains, on the left 

and the right in this 

image. There are two 

crossover points 

where the strands 

cross from one 

domain into the 

other. Image from 

Mao, 2004. [2] 



DNA nanotechnology 



206 



Tile -based arrays 



* 




Assembly of a DX array. Each bar 

represents a double-helical domain of DNA, 

with the shapes representing comlimentary 

sticky ends. The DX molecule at top will 

combine into the two-dimensional DNA 

array shown at bottom. Image from Mao, 

2004. [2] 



DX arrays 

DX, Double Crossover, molecules can be equipped 
with sticky ends in order to combine them into a 
two-dimenstional periodic lattice. Each DX molecule 
has four termini, one at each end of the two 
double-helical domains, and these can be equipped 
with sticky ends that program them to combine into 
a specific pattern. More than one type of DX can be 
used which can be made to arrange in rows or any 
other tessellated pattern. They thus form extended 
flat sheets which are essentially two-dimensional 
crystals of DNA. [4] 

DNA nanotubes 



In addition to flat sheets, DX arrays have been made 

to form hollow tubes of 4-20 nm diameter. These 

DNA nanotubes are somewhat similar in size and shape to carbon nanotubes, but the 

carbon nanotubes are stronger and better conductors, whereas the DNA nanotubes are 



more easily modified and connected to other structures. 



[5] 



Other tile arrays 

Two-dimensional arrays have been made out of other motifs as well, including the Holliday 
junction rhombus array as well as various DX-based arrays in the shapes of triangles and 
hexagons. Another motif, the six-helix bundle, has the ability to form three-dimensional 
DNA arrays as well. 

DNA origami 

As an alternative to the tile-based approach, two-dimensional DNA structures can be made 
from a single, long DNA strand of arbitrary sequence which is folded into the desired shape 
by using shorter, "staple" strands. This allows the creation of two-dimensional shapes at the 
nanoscale using DNA. Demonstrated designs have included the smiley face and a coarse 
map of North America. DNA origami was the cover story of Nature on March 15, 2006. 



DNA polyhedra 

A number of three-dimensional DNA molecules have been made which have the 

connectivity of a polyhedron such as an octahedron or cube. In other words, the DNA 

duplexes trace the edges of a polyhedron with a DNA junction at each vertex. The earliest 

demonstrations of DNA polyhedra involved multiple ligations and solid-phase synthesis 

steps to create catenated polyhedra. More recently, there have been demonstrations of a 

DNA truncated octahedron made from a long single strand designed to fold into the correct 

conformation, as well as a tetrahedron which can be produced from four DNA strands in a 
, ran 

single step. 
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DNA nanomechanical devices 

DNA complexes have been made which change their conformation upon some stimulus. 
These are intended to have applications in nanorobotics. One of the first such devices, 
called "molecular tweezers," changes from an open to a closed state based upon the 
presence of control strands. 

DNA machines have also been made which show a twisting motion. One of these makes use 
of the transition between the B-DNA and Z-DNA forms to respond to a change in buffer 
conditions. Another relies on the presence of control strands to switch from a 
paranemic-crossover (PX) conformation to a double-junction (JX2) conformation. 

Stem Loop Controllers 

A design called a stem loop, consisting of a single strand of DNA which has a loop at an 

end, are a dynamic structure that opens and closes when a piece of DNA bonds to the loop 

rm r 1 21 
part. This effect has been exploited to create several logic gates. These logic gates 

have been used to create the computers MAYA I and MAYA II which can play tick-tac-toe to 

ri3i 
some extent. 
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Applications 

Algorithmic self-assembly 

DNA nanotechnology has been applied to 
the related field of DNA computing. A DX 
array has been demonstrated whose 
assembly encodes an XOR operation, which 
allows the DNA array to implement a 
cellular automaton which generates a 
fractal called the Sierpinski gasket. This 
shows that computation can be 
incorporated into the assembly of DNA 
arrays, increasing its scope beyond simple 
periodic arrays. 

Note that DNA computing overlaps with, 

but is distinct from, DNA nanotechnology. 

The latter uses the specificity of 

Watson-Crick basepairing to make novel 

structures out of DNA. These structures can be used for DNA computing, but they do not 

have to be. Additionally, DNA computing can be done without using the types of molecules 

made possible by DNA Nanotechnology. 
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The Sierpinski gasket. 
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Nanoarchitecture 

The idea of using DNA arrays to 
template the assembly of other 
functional molecules has been 
around for a while, but only 
recently has progress been made 
in reducing these kinds of schemes 
to practice. In 2006, researchers 
covalently attached gold 

nanoparticles to a DX-based tile 
and showed that self-assembly of 
the DNA structures also assembled 
the nanoparticles hosted on them. 
A non-covalent hosting scheme 
was shown in 2007, using Dervan 
polyamides on a DX array to 
arrange streptavidin proteins on 
specific kinds of tiles on the DNA 
array. Previously in 2006 

LaBean demonstrated the letters 
"D" "N" and "A" created on a 4x4 DX array using streptavidin. 







DNA arrays that display a representation of the Sierpinski gasket 

on their surfaces. Click the image for further details. Image from 

Rothemund et a\., 2004. [14] 



[17] 



DNA has also been used to assemble a single walled carbon nanotube Field-effect 



transistor 



[18] 



See also 

• Mechanical properties of DNA 



External links 

Chengde Mao page at Purdue University [19] 

John Reif lab at Duke University [20] 

Nadrian Seeman lab at NYU [21] 

William M. Shih lab at Harvard Medical School [22] 

Andrew Turberfield lab at Oxford University [23] 

Erik Winfree lab at Caltech [24] 

Hao Yan lab at Arizona State University [25] 

Bernard Yurke formerly at Bell Labs [26] now at Boise State University [27] 

Thorn LaBean at Duke University [28] 

Software for 3D DNA design, modeling and/or simulation: 

• Ascalaph Designer [ ] 

• caDNAno [30] 

• GIDEON [31] 

r^2i 

• NanoEngineer-1 

International Society for Nanoscale Science, Computation and Engineering [33] 
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Molecular self-assembly 



Molecular self-assembly is 

the process by which 

molecules adopt a defined 

arrangement without guidance 

or management from an 

outside source. There are two 

types of self-assembly, 

intramolecular self-assembly 

and intermolecular 

self-assembly. Most often the 

term molecular self-assembly refers to intermolecular 

intramolecular analog is more commonly called folding. 




An example of a molecular self-assembly through hydrogen bonds 



reported by Meijer and coworkers 



[11 



self-assembly, while the 



Supramolecular Systems 

Molecular self-assembly is a key concept in supramolecular chemistry ] [ ] [ ] since 
assembly of the molecules is directed through noncovalent interactions (e.g., hydrogen 
bonding, metal coordination, hydrophobic forces, van der Waals forces, n-n interactions, 
and/or electrostatic) as well as electromagnetic interactions. Common examples include the 
formation of micelles, vesicles, liquid crystal phases, and Langmuir monolayers by 
surfactant molecules. Further examples of supramolecular assemblies demonstrate that a 
variety of different shapes and sizes can be obtained using molecular self-assembly. 

Molecular self-assembly has allowed the construction of challenging molecular topologies. 
An example are Borromean rings, interlocking rings wherein removal of one ring unlocks 
each of the other rings. DNA has been used to prepare a molecular analog of Borromean 
rings. More recently, a similar structure has been prepared using non-biological building 
blocks. [7] 



Biological Systems 

Molecular self-assembly is crucial to the function of cells. It is exhibited in the self-assembly 
of lipids to form the membrane, the formation of double helical DNA through hydrogen 
bonding of the individual strands, and the assembly of proteins to form quaternary 
structures. Molecular self-assembly of incorrectly folded proteins into insoluble amyloid 
fibers is responsible for infectious prion-related neurodegenerative diseases. 
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Nanotechnology 

Molecular self-assembly is an 

important aspect of bottom-up 

approaches to 

nanotechnology. Using 

molecular self-assembly the 

final (desired) structure is 

programmed in the shape and 

functional groups of the 

molecules. Self-assembly is 

referred to as a 'bottom-up' 

manufacturing technique in 

contrast to a 'top-down' 

technique such as lithography 

where the desired final 

structure is carved from a 

larger block of matter. In the speculative vision of molecular nanotechnology, microchips of 

the future might be made by molecular self-assembly. An advantage to constructing 

nanostructure using molecular self-assembly for biological materials is that they will 

degrade back into individual molecules that can be broken down by the body. 




100 nm 



The DNA structure at left (schematic shown) will self-assemble into 
the structure visualized by atomic force microscopy at right. Image 
from Strong. 



DNA nanotechnology 

DNA nanotechnology is an area of current research that uses the bottom-up, self-assembly 
approach for nanotechnological goals. DNA nanotechnology uses the unique molecular 
recognition properties of DNA and other nucleic acids to create self-assembling branched 
DNA complexes with useful properties. DNA is thus used as a structural material rather 
than as a carrier of biological information, to make structures such as two-dimensional 
periodic lattices (both tile-based as well as using the "DNA origami" method) and 
three-dimensional structures in the shapes of polyhedra. These DNA structures have 

also been used to template the assembly of other molecules such as gold nanoparticles 

n 21 
and streptavidin proteins. 
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See also 

• Supramolecular assembly 

• Supramolecular chemistry 
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Cell signaling 



Cell signaling is part of a complex system of 

communication that governs basic cellular activities 

rii 
and coordinates cell actions. The ability of cells to 

perceive and correctly respond to their 

microenvironment is the basis of development, tissue 

repair, and immunity as well as normal tissue 

homeostasis. Errors in cellular information processing 

are responsible for diseases such as cancer, 

autoimmunity, and diabetes. By understanding cell 

signaling, diseases may be treated effectively and, 

theoretically, artificial tissues may be yielded. 

Traditional work in biology has focused on studying 

individual parts of cell signaling pathways. Systems 

biology research helps us to understand the underlying 

structure of cell signaling networks and how changes in 

these networks may affect the transmission and flow of 

information. Such networks are complex systems in 

their organization and may exhibit a number of 

emergent properties including bistability and 

ultrasensitivity. Analysis of cell signaling networks requires a combination of experimental 

and theoretical approaches including the development and analysis of simulations and 

modelling. 




Unicellular and multicellular organism cell signaling 

Cell signaling has been most extensively studied in the 

context of human diseases and signaling between cells 

of a single organism. However, cell signaling may also 

occur between the cells of two different organisms. In 

many mammals, early embryo cells exchange signals 

with cells of the uterus. In the human gastrointestinal 

tract, bacteria exchange signals with each other and 

with human epithelial and immune system cells. For 

the yeast Saccharomyces cerevisiae during mating, 

some cells send a peptide signal (mating factor Figure l . Example of signaling 

pheromones) into their environment. The mating factor between bacteria. Salmonella 

peptide may bind to a cell surface receptor on other ententidis uses acyl-homosenne 

„ , . , ., ,. r 51 lactone for Quorum sensing (see: 

yeast cells and induce them to prepare for mating. 1 , ,_ _, . , _ [2], 

Inter-Bacterial Communication ) 
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Types of signals 




'Suppressor e( Sp i) 
of Hairless 

Figure 2. Notch-mediated juxtacrine 
signal between adjacent cells. 



Cells communicate with each other via direct contact 
(juxtacrine signaling), over short distances (paracrine 
signaling), or over large distances and/or scales 
(endocrine signaling). 

Some cell-to-cell communication requires direct cell-cell 
contact. Some cells can form gap junctions that connect 
their cytoplasm to the cytoplasm of adjacent cells. In 
cardiac muscle, gap junctions between adjacent cells 
allows for action potential propagation from the cardiac 
pacemaker region of the heart to spread and 
coordinately cause contraction of the heart. 



The Notch signaling mechanism is an example of 
juxtacrine signalling (also known as contact dependent 
signaling) in which two adjacent cells must make physical contact in order to communicate. 
This requirement for direct contact allows for very precise control of cell differentiation 
during embryonic development. In the worm Caenorhabditis elegans, two cells of the 
developing gonad each have an equal chance of terminally differentiating or becoming a 
uterine precursor cell that continues to divide. The choice of which cell continues to divide 
is controlled by competition of cell surface signals. One cell will happen to produce more of 
a cell surface protein that activates the Notch receptor on the adjacent cell. This activates a 
feedback loop or system that reduces Notch expression in the cell that will differentiate and 
increases Notch on the surface of the cell that continues as a stem cell. 



[6] 



Many cell signals are carried by molecules that are released by one cell and move to make 
contact with another cell. Endocrine signals are called hormones. Hormones are produced 
by endocrine cells and they travel through the blood to reach all parts of the body. 
Specificity of signaling can be controlled if only some cells can respond to a particular 
hormone. Paracrine signals target only cells in the vicinity of the emitting cell. 
Neurotransmitters represent an example. Some signaling molecules can function as both a 
hormone and a neurotransmitter. For example, epinephrine and norepinephrine can 
function as hormones when released from the adrenal gland and are transported to the 
heart by way of the blood stream. Norepinephrine can also be produced by neurons to 

T71 

function as a neurotransmitter within the brain. Estrogen can be released by the ovary 
and function as a hormone or act locally via paracrine or autocrine signaling. 



Receptors for cell signals 

Cells receive information from their environment through a class of proteins known as 
receptors. Notch is a cell surface protein that functions as a receptor. Animals have a small 
set of genes that code for signaling proteins that interact specifically with Notch receptors 
and stimulate a response in cells that express Notch on their surface. Molecules that 
activate (or, in some cases, inhibit) receptors can be classified as hormones, 
neurotransmitters, cytokines, growth factors but all of these are called receptor ligands. 
The details of ligand-receptor interactions are fundamental to cell signaling. 

As shown in Figure 2 (above, left), Notch acts as a receptor for ligands that are expressed 
on adjacent cells. While many receptors are cell surface proteins, some are found inside 
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cells. For example, estrogen is a hydrophobic molecule that can pass through the lipid 
bilayer of cell surface membranes. Estrogen receptors inside cells of the uterus can be 
activated by estrogen that comes from the ovaries, enters the target cells, and binds to 
estrogen receptors. 

Other signaling molecules are unable to permeate the hydrophobic cell membrane due to 
their hydrophilic nature, so their target receptor is expressed on the membrane. When such 
signaling molecule activates its receptor, the signal is carried into the cell usually by means 
of a second messenger such as cAMP. 
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Signaling pathways 

In some cases, receptor 
activation caused by ligand 
binding to a receptor is 
directly coupled to the cell's 
response to the ligand. For 
example, the neurotransmitter 
GABA can activate a cell 

Cylakines 

{e.g.. EPC| 

surface receptor that is part of 

an ion channel. GABA binding 

to a GABA A receptor on a 

neuron opens a 

chloride-selective ion channel 

that is part of the receptor. 

GABA A receptor activation 

allows negatively-charged 

chloride ions to move into the 

neuron, which inhibits the 

ability of the neuron to produce action potentials. However, for many cell surface receptors, 

ligand-receptor interactions are not directly linked to the cell's response. The activated 

receptor must first interact with other proteins inside the cell before the ultimate 

physiological effect of the ligand on the cell's behavior is produced. Often, the behavior of a 

chain of several interacting cell proteins is altered following receptor activation. The entire 

set of cell changes induced by 




Overview of signal transduction pathways. 
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receptor activation is called a signal transduction mechanism or 
pathway. 

In the case of Notch-mediated signaling, the signal transduction 
mechanism can be relatively simple. As shown in Figure 2 (above, 
left), activation of Notch can cause the Notch protein to be altered 
by a protease. Part of the Notch protein is released from the cell 
surface membrane and can act to change the pattern of gene 
transcription in the cell nucleus. This causes the responding cell to 
make different proteins, resulting in an altered pattern of cell 
behavior. Cell signaling research involves studying the spatial and 
temporal dynamics of both receptors and the components of 
signaling pathways that are activated by receptors in various cell 
types. 




transcription nucleus 

Figure 3. Diagram 

showing key components 

of a signal transduction 

pathway. See the 

MAPK/ERK pathway 

article for details. 



A more complex signal transduction pathway is shown in Figure 3. This pathway involves 
changes of protein-protein interactions inside the cell, induced by an external signal. Many 
growth factors bind to receptors at the cell surface and stimulate cells to progress through 
the cell cycle and divide. Several of these receptors are kinases that start to phosphorylate 
themselves and other proteins when binding to a ligand. This phosphorylation can generate 
a binding site for a different protein and thus induce protein-protein interaction. In Figure 
3, the ligand (called epidermal growth factor (EGF)) binds to the receptor (called EGFR). 
This activates the receptor to phosphorylate itself. The phosphorylated receptor binds to an 
adaptor protein (GRB2), which couples the signal to further downstream signaling 
processes. For example, one of the signal transduction pathways that are activated is called 
the mitogen-activated protein kinase (MAPK) pathway. The signal transduction component 
labeled as "MAPK" in the pathway was originally called "ERK," so the pathway is called the 
MAPK/ERK pathway. The MAPK protein is an enzyme, a protein kinase that can attach 
phosphate to target proteins such as the transcription factor MYC and, thus, alter gene 
transcription and, ultimately, cell cycle progression. Many cellular proteins are activated 
downstream of the growth factor receptors (such as EGFR) that initiate this signal 
transduction pathway. 
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Some signaling transduction pathways respond differently depending on the amount of 
signaling received by the cell. For instance, the hedgehog protein activates different genes, 
depending on the amount of hedgehog protein present. 

Complex multi-component signal transduction pathways provide opportunities for feedback, 
signal amplification, and interactions inside one cell between multiple signals and signaling 
pathways. 

Classification of intercellular communication 

Within endocrinology (the study of intercellular signalling in animals) and the endocrine 
system, intercellular signalling is subdivided into the following classifications: 

• Endocrine signals are produced by endocrine cells and travel through the blood to reach 
all parts of the body. 

• Paracrine signals target only cells in the vicinity of the emitting cell. Neurotransmitters 
represent an example. 

• Autocrine signals affect only cells that are of the same cell type as the emitting cell. An 
example for autocrine signals is found in immune cells. 

• Juxtacrine signals are transmitted along cell membranes via protein or lipid components 
integral to the membrane and are capable of affecting either the emitting cell or cells 
immediately adjacent. 

See also 

• Molecular Cellular Cognition 

• Crosstalk (biology) 

• MAPK signaling pathway 

• Hedgehog signaling pathway 

• TGF beta signaling pathway 

• JAK-STAT signaling pathway 

• cAMP dependent pathway 

• Signal transduction 

• Systems biology 

• Semiotics 

• Lipid signaling 
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information about signaling pathways in human cells. 

• Cell Communication (http://www.ncbi. nlm.nih.gov/entrez/query.fcgi?cmd=Search& 
db=books&doptcmdl=GenBookHL&term="Cell+ signaling" +AND+ mboc4[book] + 
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doptcmdl=GenBookHL&term=cell+biology+AND+mcb[book]+AND+105032[uid]& 
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H. Freeman and Company. 

• MeSH Intercellular+Signaling+Peptides+and+Proteins (http://www.nlm.nih.gov/cgi/ 
mesh/2 009/MB_cgi?mode=&term=Intercellular+Signaling+Peptides+and-l- Proteins) 

• MeSH Cell+ Communication (http://www.nlm.nih.gov/cgi/mesh/2009/ 
MB_cgi?mode=&term=Cell+ Communication) 

• ESIGNET Research Project (http://www.esignet.net) 
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Molecular evolution 

Molecular evolution is the process of evolution at the scale of DNA, RNA, and proteins. 
Molecular evolution emerged as a scientific field in the 1960s as researchers from 
molecular biology, evolutionary biology and population genetics sought to understand 
recent discoveries on the structure and function of nucleic acids and protein. Some of the 
key topics that spurred development of the field have been the evolution of enzyme 
function, the use of nucleic acid divergence as a "molecular clock" to study species 
divergence, and the origin of non-functional or junk DNA. Recent advances in genomics, 
including whole-genome sequencing, high-throughput protein characterization, and 
bioinformatics have led to a dramatic increase in studies on the topic. In the 2000s, some of 
the active topics have been the role of gene duplication in the emergence of novel gene 
function, the extent of adaptive molecular evolution versus neutral drift, and the 
identification of molecular changes responsible for various human characteristics especially 
those pertaining to infection, disease, and cognition. 

Principles of molecular evolution 

Mutations 

Mutations are permanent, transmissible changes to the genetic material (usually DNA or 
RNA) of a cell. Mutations can be caused by copying errors in the genetic material during 
cell division and by exposure to radiation, chemicals, or viruses, or can occur deliberately 
under cellular control during the processes such as meiosis or hypermutation. Mutations 
are considered the driving force of evolution, where less favorable (or deleterious) 
mutations are removed from the gene pool by natural selection, while more favorable (or 
beneficial) ones tend to accumulate. Neutral mutations do not affect the organism's 
chances of survival in its natural environment and can accumulate over time, which might 
result in what is known as punctuated equilibrium; the modern interpretation of classic 
evolutionary theory. 

Causes of change in allele frequency 

There are three known processes that affect the survival of a characteristic; or, more 
specifically, the frequency of an allele (variant of a gene): 

• Genetic drift describes changes in gene frequency that cannot be ascribed to selective 
pressures, but are due instead to events that are unrelated to inherited traits. This is 
especially important in small mating populations, which simply cannot have enough 
offspring to maintain the same gene distribution as the parental generation. 

• Gene flow or Migration: or gene admixture is the only one of the agents that makes 
populations closer genetically while building larger gene pools. 

• Selection, in particular natural selection produced by differential mortality and fertility. 
Differential mortality is the survival rate of individuals before their reproductive age. If 
they survive, they are then selected further by differential fertility - that is, their total 
genetic contribution to the next generation. In this way, the alleles that these surviving 
individuals contribute to the gene pool will increase the frequency of those alleles. Sexual 
selection, the attraction between mates that results from two genes, one for a feature 
and the other determining a preference for that feature, is also very important. 
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Molecular study of phylogeny 

Molecular systematics is a product of the traditional field of systematics and molecular 
genetics. It is the process of using data on the molecular constitution of biological 
organisms' DNA, RNA, or both, in order to resolve questions in systematics, i.e. about their 
correct scientific classification or taxonomy from the point of view of evolutionary biology. 

Molecular systematics has been made possible by the availability of techniques for DNA 
sequencing, which allow the determination of the exact sequence of nucleotides or bases in 
either DNA or RNA. At present it is still a long and expensive process to sequence the 
entire genome of an organism, and this has been done for only a few species. However, it is 
quite feasible to determine the sequence of a defined area of a particular chromosome. 
Typical molecular systematic analyses require the sequencing of around 1000 base pairs. 

The driving forces of evolution 

Depending on the relative importance assigned to the various forces of evolution, three 

rn 
perspectives provide evolutionary explanations for molecular evolution. 

While recognizing the importance of random drift for silent mutations, selectionists 
hypotheses argue that balancing and positive selection are the driving forces of molecular 
evolution. Those hypotheses are often based on the broader view called panselectionism, 
the idea that selection is the only force strong enough to explain evolution, relaying random 
drift and mutations to minor roles. 

Neutralists hypotheses emphasize the importance of mutation, purifying selection and 
random genetic drift. The introduction of the neutral theory by Kimura, quickly 
followed by King and Jukes' own findings, lead to a fierce debate about the relevance of 
neodarwinism at the molecular level. The Neutral theory of molecular evolution states that 
most mutations are deleterious and quickly removed by natural selection, but of the 
remaining ones, the vast majority are neutral with respect to fitness while the amount of 
advantageous mutations is vanishingly small. The fate of neutral mutations are governed by 
genetic drift, and contribute to both nucleotide polymorphism and fixed differences 
between species. 

Mutationists hypotheses emphasize random drift and biases in mutation patterns. 
Sueoka was the first to propose a modern mutationist view. He proposed that the variation 
in GC content was not the result of positive selection, but a consequence of the GC 
mutational pressure. 

Related fields 

An important area within the study of molecular evolution is the use of molecular data to 
determine the correct biological classification of organisms. This is called molecular 
systematics or molecular phylogenetics. 

Tools and concepts developed in the study of molecular evolution are now commonly used 
for comparative genomics and molecular genetics, while the influx of new data from these 
fields has been spurring advancement in molecular evolution. 
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Key researchers in molecular evolution 

Some researchers who have made key contributions to the development of the field: 

Motoo Kimura — Neutral theory 

Masatoshi Nei — Adaptive evolution 

Walter M. Fitch — Phylogenetic reconstruction 

Walter Gilbert — RNA world 

Joe Felsenstein — Phylogenetic methods 

Susumu Ohno — Gene duplication 

John H. Gillespie — Mathematics of adaptation 



Journals and societies 

Journals dedicated to molecular evolution include Molecular Biology and Evolution, Journal 
of Molecular Evolution, and Molecular Phylo genetics and Evolution. Research in molecular 
evolution is also published in journals of genetics, molecular biology, genomics, 
systematics, or evolutionary biology. The Society for Molecular Biology and Evolution L J 
publishes the journal "Molecular Biology and Evolution" and holds an annual international 
meeting. 

See also 
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Further reading 

• Li, W.-H. (2006). Molecular Evolution. Sinauer. ISBN 0878934804. 

• Lynch, M. (2007). The Origins of Genome Architecture. Sinauer. ISBN 0878934847. 
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Molecular phylogenetics 

Molecular phylogenetics, also known as molecular systematics, is the use of the 

structure of molecules to gain information on an organism's evolutionary relationships. The 
result of a molecular phylogenetic analysis is expressed in a phylogenetic tree. 

Techniques and applications 

Every living organism contains DNA, RNA, and proteins. Closely related organisms 
generally have a high degree of agreement in the molecular structure of these substances, 
while the molecules of organisms distantly related usually show a pattern of dissimilarity. 
Conserved sequences such mitochondrial DNA are expected to accumulate mutations over 
time, and assuming a constant rate of mutation provide a molecular clock for dating 
divergence. Molecular phylogeny uses such data to build a "relationship tree" that shows 
the probable evolution of various organisms. Not until recent decades, however, has it been 
possible to isolate and identify these molecular structures. 

The most common approach is the comparison of sequences for genes using sequence 
alignment techniques to identify similarity. Another application of molecular phylogeny is in 
DNA barcoding, where the species of an individual organism is identified using small 
sections of mitochondrial DNA. Another application of the techniques that make this 
possible can be seen in the very limited field of human genetics, such as the ever more 
popular use of genetic testing to determine a child's paternity, as well as the emergence of 
a new branch of criminal forensics focused on evidence known as genetic fingerprinting. 

The effect on traditional biological classification schemes in the biological sciences has 
been dramatic as well. Work that was once immensely labor- and materials-intensive can 
now be done quickly and easily, leading to yet another source of information becoming 
available for systematic and taxonomic appraisal. This particular kind of data has become 
so popular that taxonomical schemes based solely on molecular data may be encountered. 

Theoretical background 

Early attempts at molecular systematics were also termed as chemotaxonomy and made use 
of proteins, enzymes, carbohydrates and other molecules which were separated and 
characterized using techniques such as chromatography. These have been largely replaced 
in recent times by DNA sequencing which produces the exact sequences of nucleotides or 
bases in either DNA or RNA segments extracted using different techniques. These are 
generally considered superior for evolutionary studies since the actions of evolution are 
ultimately reflected in the genetic sequences. At present it is still a long and expensive 
process to sequence the entire DNA of an organism (its genome), and this has been done 
for only a few species. However it is quite feasible to determine the sequence of a defined 
area of a particular chromosome. Typical molecular systematic analyses require the 
sequencing of around 1000 base pairs. At any location within such a sequence, the bases 
found in a given position may vary between organisms. The particular sequence found in a 
given organism is referred to as its haplotype. In principle, since there are four base types, 
with 1000 base pairs, we could have 4 distinct haplotypes. However, for organisms 

within a particular species or in a group of related species, it has been found empirically 
that only a minority of sites show any variation at all and most of the variations that are 
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found are correlated, so that the number of distinct haplotypes that are found is relatively 
small. 

In a molecular systematic analysis, the haplotypes are determined for a defined area of 
genetic material; ideally a substantial sample of individuals of the target species or other 
taxon are used however many current studies are based on single individuals. Haplotypes of 
individuals of closely related, but supposedly different, taxa are also determined. Finally, 
haplotypes from a smaller number of individuals from a definitely different taxon are 
determined: these are referred to as an out group. The base sequences for the haplotypes 
are then compared. In the simplest case, the difference between two haplotypes is assessed 
by counting the number of locations where they have different bases: this is referred to as 
the number of substitutions (other kinds of differences between haplotypes can also occur, 
for example the insertion of a section of nucleic acid in one haplotype that is not present in 
another). Usually the difference between organisms is re-expressed as a percentage 
divergence, by dividing the number of substitutions by the number of base pairs analysed: 
the hope is that this measure will be independent of the location and length of the section 
of DNA that is sequenced. 

An older and superseded approach was to determine the divergences between the 
genotypes of individuals by DNA-DNA hybridisation. The advantage claimed for using 
hybridisation rather than gene sequencing was that it was based on the entire genotype, 
rather than on particular sections of DNA. Modern sequence comparison techniques 
overcome this objection by the use of multiple sequences. 

Once the divergences between all pairs of samples have been determined, the resulting 
triangular matrix of differences is submitted to some form of statistical cluster analysis, and 
the resulting dendrogram is examined in order to see whether the samples cluster in the 
way that would be expected from current ideas about the taxonomy of the group, or not. 
Any group of haplotypes that are all more similar to one another than any of them is to any 
other haplotype may be said to constitute a clade. Statistical techniques such as 
bootstrapping and jackknifing help in providing reliability estimates for the positions of 
haplotypes within the evolutionary trees. 

Characteristics and assumptions of molecular systematics 

This example illustrates several characteristics of molecular systematics and its underlying 
assumptions. 

1. Molecular systematics is an essentially cladistic approach: it assumes that classification 
must correspond to phylogenetic descent, and that all valid taxa must be monophyletic. 

2. Molecular systematics often uses the molecular clock assumption that quantitative 
similarity of genotype is a sufficient measure of the recency of genetic divergence. 
Particularly in relation to speciation, this assumption could be wrong if either 

1 . some relatively small genotypic modification acted to prevent interbreeding between 
two groups of organisms, or 

2. in different subgroups of the organisms being considered, genetic modification 
proceeded at different rates. 

3. In animals, it is often convenient to use mitochondrial DNA for molecular systematic 
analysis. However, because in mammals mitochondria are inherited only from the 
mother, this is not fully satisfactory, because inheritance in the paternal line might not be 
detected: in the example above, Vila et al. cite more limited studies with chromosomal 
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DNA that support their conclusions. 

These characteristics and assumptions are not wholly uncontroversial among biological 
systematists. As a cladistic method, molecular systematics is open to the same criticisms as 
cladistics in general. It can also be argued that it is a mistake to replace a classification 
based on visible and ecologically relevant characteristics by one based on genetic details 
that may not even be expressed in the phenotype. However the molecular approach to 
systematics, and its underlying assumptions, are gaining increasing acceptance. As gene 
sequencing becomes easier and cheaper, molecular systematics is being applied to more 
and more groups, and in some cases is leading to radical revisions of accepted taxonomies. 

History of molecular phylogenetics 

The theoretical frameworks for molecular systematics were laid in the 1960s in the works 
of Emile Zuckerkandl, Emanuel Margoliash, Linus Pauling and Walter M. Fitch. 
Applications of molecular systematics were pioneered by Charles G. Sibley (birds), Herbert 
C. Dessauer (herpetology), and Morris Goodman (primates), followed by Allan C. Wilson, 
Robert K. Selander, and John C. Avise (who studied various groups). Work with protein 
electrophoresis began around 1956. Although the results were not quantitative and did not 
initially improve on morphological classification, they provided tantalizing hints that 
long-held notions of the classifications of birds, for example, needed substantial revision. In 

T21 

the period of 1974-1986, DNA-DNA hybridization was the dominant technique. 

Further reading 

• Felsenstein, J. 2004. Inferring phytogenies. Sinauer Associates Incorporated. ISBN 
0-87893-177-5. 

• Hillis, D. M. & Moritz, C. 1996. Molecular systematics. 2nd ed. Sinauer Associates 
Incorporated. ISBN 0-87893-282-8. 

• Page, R. D. M. & Holmes, E. C. 1998. Molecular evolution: a phylogenetic approach. 
Blackwell Science, Oxford. ISBN 0-86542-889-1. 
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External links 

• NCBI - Systematics and Molecular Phylogenetics (http://www.ncbi.nlm.nih.gov/ 
About/primer/p hylo.html) 

Computational phylogenetics 

Computational phylogenetics is the application of computational algorithms, methods 
and programs to phylogenetic analyses. The goal is to assemble a phylogenetic tree 
representing a hypothesis about the evolutionary ancestry of a set of genes, species, or 
other taxa. For example, these techniques have been used to explore the family tree of 
hominid species and the relationships between specific genes shared by many types of 
organisms. ^ Traditional phylogenetics relies on morphological data obtained by measuring 
and quantifying the phenotypic properties of representative organisms, while the more 
recent field of molecular phylogenetics uses nucleotide sequences encoding genes or amino 
acid sequences encoding proteins as the basis for classification. Many forms of molecular 
phylogenetics are closely related to and make extensive use of sequence alignment in 
constructing and refining phylogenetic trees, which are used to classify the evolutionary 
relationships between homologous genes represented in the genomes of divergent species. 
The phylogenetic trees constructed by computational methods are unlikely to perfectly 
reproduce the evolutionary tree that represents the historical relationships between the 
species being analyzed. The historical species tree may also differ from the historical tree 
of an individual homologous gene shared by those species. 

Producing a phylogenetic tree requires a measure of homology among the characteristics 
shared by the taxa being compared. In morphological studies, this requires explicit 
decisions about which physical characteristics to measure and how to use them to encode 
distinct states corresponding to the input taxa. In molecular studies, a primary problem is 
in producing a multiple sequence alignment (MSA) between the genes or amino acid 
sequences of interest. Progressive sequence alignment methods produce a phylogenetic 
tree by necessity because they incorporate new sequences into the calculated alignment in 
order of genetic distance. Although a phylogenetic tree can always be constructed from an 
MSA, phylogenetics methods such as maximum parsimony and maximum likelihood do not 
require the production of an initial or concurrent MSA. 

Types of phylogenetic trees 

Phylogenetic trees generated by computational phylogenetics can be either rooted or 
unrooted depending on the input data and the algorithm used. A rooted tree is a directed 
graph that explicitly identifies a most recent common ancestor (MRCA), usually an imputed 
sequence that is not represented in the input. Genetic distance measures can be used to 
plot a tree with the input sequences as leaf nodes and their distances from the root 
proportional to their genetic distance from the hypothesized MRCA. Identification of a root 
usually requires the inclusion in the input data of at least one "outgroup" known to be only 
distantly related to the sequences of interest. 

By contrast, unrooted trees plot the distances and relationships between input sequences 
without making assumptions regarding their descent. An unrooted tree can always be 
produced from a rooted tree, but a root cannot usually be placed on an unrooted tree 
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without additional data on divergence rates, such as the assumption of the molecular clock 
hypothesis. - 1 

The set of all possible phylogenetic trees for a given group of input sequences can be 
conceptualized as a discretely defined multidimensional "tree space" through which search 
paths can be traced by optimization algorithms. Although counting the total number of 
trees for a nontrivial number of input sequences can be complicated by variations in the 
definition of a tree topology, it is always true that there are more rooted than unrooted 
trees for a given number of inputs and choice of parameters. ^ 

Coding characters and defining homology 

Morphological analysis 

The basic problem in morphological phylogenetics is the assembly of a matrix representing 
a mapping from each of the taxa being compared to representative measurements for each 
of the phenotypic characteristics being used as a classifier. The types of phenotypic data 
used to construct this matrix depend on the taxa being compared; for individual species, 
they may involve measurements of average body size, lengths or sizes of particular bones or 
other physical features, or even behavioral manifestations. Of course, since not every 
possible phenotypic characteristic could be measured and encoded for analysis, the 
selection of which features to measure is a major inherent obstacle to the method. The 
decision of which traits to use as a basis for the matrix necessarily represents a hypothesis 
about which traits of a species or higher taxon are evolutionarily relevant. Morphological 
studies can be confounded by examples of convergent evolution of phenotypes. A major 
challenge in constructing useful classes is the high likelihood of inter-taxon overlap in the 
distribution of the phenotype's variation. The inclusion of extinct taxa in morphological 
analysis is often difficult due to absence of or incomplete fossil records, but has been shown 
to have a significant effect on the trees produced; in one study only the inclusion of extinct 
species of apes produced a morphologically derived tree that was consistent with that 
produced from molecular data. 

Some phenotypic classifications, particularly those used when analyzing very diverse 
groups of taxa, are discrete and unambiguous; classifying organisms as possessing or 
lacking a tail, for example, is straightforward in the majority of cases, as is counting 
features such as eyes or vertebrae. However, the most appropriate representation of 
continuously varying phenotypic measurements is a controversial problem without a 
general solution. A common method is simply to sort the measurements of interest into two 
or more classes, rendering continuous observed variation as discretely classifiable (e.g., all 
examples with humerus bones longer than a given cutoff are scored as members of one 
state, and all members whose humerus bones are shorter than the cutoff are scored as 
members of a second state). This results in an easily manipulated data set but has been 
criticized for poor reporting of the basis for the class definitions and for sacrificing 
information compared to methods that use a continuous weighted distribution of 
measurements. 

Because morphological data is extremely labor-intensive to collect, whether from literature 
sources or from field observations, reuse of previously compiled data matrices is not 
uncommon, although this may propagate flaws in the original matrix into multiple 
derivative analyses. 
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Molecular analysis 

The problem of character coding is very different in molecular analyses, as the characters 
in biological sequence data are immediate and discretely defined - distinct nucleotides in 
DNA or RNA sequences and distinct amino acids in protein sequences. However, defining 
homology can be challenging due to the inherent difficulties of multiple sequence 
alignment. For a given gapped MSA, several rooted phylogenetic trees can be constructed 
that vary in their interpretations of which changes are "mutations" versus ancestral 
characters, and which events are insertion mutations or deletion mutations. For example, 
given only a pairwise alignment with a gap region, it is impossible to determine whether 
one sequence bears an insertion mutation or the other carries a deletion. The problem is 
magnified in MSAs with unaligned and nonoverlapping gaps. In practice, sizable regions of 
a calculated alignment may be discounted in phylogenetic tree construction to avoid 
integrating noisy data into the tree calculation. 

Distance-matrix methods 

Distance-matrix methods of phylogenetic analysis explicitly rely on a measure of "genetic 
distance" between the sequences being classified, and therefore they require an MSA as an 
input. Distance is often defined as the fraction of mismatches at aligned positions, with 
gaps either ignored or counted as mismatches. Distance methods attempt to construct an 
all-to-all matrix from the sequence query set describing the distance between each 
sequence pair. From this is constructed a phylogenetic tree that places closely related 
sequences under the same interior node and whose branch lengths closely reproduce the 
observed distances between sequences. Distance-matrix methods may produce either 
rooted or unrooted trees, depending on the algorithm used to calculate them. They are 
frequently used as the basis for progressive and iterative types of multiple sequence 
alignments. The main disadvantage of distance-matrix methods is their inability to 
efficiently use information about local high-variation regions that appear across multiple 
subtrees. 

Neighbor-joining 

Neighbor-joining methods apply general data clustering techniques to sequence analysis 
using genetic distance as a clustering metric. The simple neighbor-joining method produces 
unrooted trees, but it does not assume a constant rate of evolution (i.e., a molecular clock) 
across lineages. Its relative, UPGMA (Unweighted Pair Group Method with Arithmetic 
mean) produces rooted trees and requires a constant-rate assumption - that is, it assumes 
an ultrametric tree in which the distances from the root to every branch tip are equal. 

Fitch-Margoliash method 

The Fitch-Margoliash method uses a weighted least squares method for clustering based on 
genetic distance. Closely related sequences are given more weight in the tree 
construction process to correct for the increased inaccuracy in measuring distances 
between distantly related sequences. The distances used as input to the algorithm must be 
normalized to prevent large artifacts in computing relationships between closely related 
and distantly related groups. The distances calculated by this method must be linear; the 
linearity criterion for distances requires that the expected values of the branch lengths for 
two individual branches must equal the expected value of the sum of the two branch 
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distances - a property that applies to biological sequences only when they have been 
corrected for the possibility of back mutations at individual sites. This correction is done 
through the use of a substitution matrix such as that derived from the Jukes-Cantor model 
of DNA evolution. The distance correction is only necessary in practice when the evolution 
rates differ among branches . * 

The least-squares criterion applied to these distances is more accurate but less efficient 
than the neighbor-joining methods. An additional improvement that corrects for 
correlations between distances that arise from many closely related sequences in the data 
set can also be applied at increased computational cost. Finding the optimal least-squares 
tree with any correction factor is NP-complete, so heuristic search methods like those 
used in maximum-parsimony analysis are applied to the search through tree space. 

Using outgroups 

Independent information about the relationship between sequences or groups can be used 
to help reduce the tree search space and root unrooted trees. Standard usage of 
distance-matrix methods involves the inclusion of at least one outgroup sequence known to 
be only distantly related to the sequences of interest in the query set. This usage can be 
seen as a type of experimental control. If the outgroup has been appropriately chosen, it 
will have a much greater genetic distance and thus a longer branch length than any other 
sequence, and it will appear near the root of a rooted tree. Choosing an appropriate 
outgroup requires the selection of a sequence that is moderately related to the sequences 
of interest; too close a relationship defeats the purpose of the outgroup and too distant adds 
noise to the analysis. Care should also be taken to avoid situations in which the species 
from which the sequences were taken are distantly related, but the gene encoded by the 
sequences is highly conserved across lineages. Horizontal gene transfer, especially between 
otherwise divergent bacteria, can also confound outgroup usage. 

Maximum parsimony 

Maximum parsimony (MP) is a method of identifying the potential phylogenetic tree that 
requires the smallest total number of evolutionary events to explain the observed sequence 
data. Some ways of scoring trees also include a "cost" associated with particular types of 
evolutionary events and attempt to locate the tree with the smallest total cost. This is a 
useful approach in cases where not every possible type of event is equally likely - for 
example, when particular nucleotides or amino acids are known to be more mutable than 
others. 

The most naive way of identifying the most parsimonious tree is simple enumeration - 
considering each possible tree in succession and searching for the tree with the smallest 
score. However, this is only possible for a relatively small number of sequences or species 
because the problem of identifying the most parsimonious tree is known to be NP-hard; [ ] 
consequently a number of heuristic search methods for optimization have been developed 
to locate a highly parsimonious tree, if not the most optimal in the set. Most such methods 
involve a steepest descent-style minimization mechanism operating on a tree 
rearrangement criterion. 
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Branch and bound 

The branch and bound algorithm is a general method used to increase the efficiency of 
searches for near-optimal solutions of NP-hard problems first applied to phylogenetics in 
the early 1980s. Branch and bound is particularly well suited to phylogenetic tree 
construction because it inherently requires dividing a problem into a tree structure as it 
subdivides the problem space into smaller regions. As its name implies, it requires as input 
both a branching rule (in the case of phylogenetics, the addition of the next species or 
sequence to the tree) and a bound (a rule that excludes certain regions of the search space 
from consideration, thereby assuming that the optimal solution cannot occupy that region). 
Identifying a good bound is the most challenging aspect of the algorithm's application to 
phylogenetics. A simple way of defining the bound is a maximum number of assumed 
evolutionary changes allowed per tree. A set of criteria known as Zharkikh's rules 
severely limit the search space by defining characteristics shared by all candidate "most 
parsimonious" trees. The two most basic rules require the elimination of all but one 
redundant sequence (for cases where multiple observations have produced identical data) 
and the elimination of character sites at which two or more states do not occur in at least 
two species. Under ideal conditions these rules and their associated algorithm would 
completely define a tree. 

Sankoff-Morel-Cedergren algorithm 

The Sankoff-Morel-Cedergren algorithm was among the first published methods to 
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simultaneously produce an MSA and a phylogenetic tree for nucleotide sequences. 1 J The 
method uses a maximum parsimony calculation in conjunction with a scoring function that 
penalizes gaps and mismatches, thereby favoring the tree that introduces a minimal 
number of such events. The imputed sequences at the interior nodes of the tree are scored 
and summed over all the nodes in each possible tree. The lowest-scoring tree sum provides 
both an optimal tree and an optimal MSA given the scoring function. Because the method is 
highly computationally intensive, an approximate method in which initial guesses for the 
interior alignments are refined one node at a time. Both the full and the approximate 
version are in practice calculated by dynamic programming. ^ 

MALIGN and POY 

More recent phylogenetic tree/MSA methods use heuristics to isolate high-scoring, but not 
necessarily optimal, trees. The MALIGN method uses a maximum-parsimony technique to 
compute a multiple alignment by maximizing a cladogram score, and its companion POY 
uses an iterative method that couples the optimization of the phylogenetic tree with 
improvements in the corresponding MSA. However, the use of these methods in 
constructing evolutionary hypotheses has been criticized as biased due to the deliberate 
construction of trees reflecting minimal evolutionary events. Both programs are 
available from the American Museum of Natural History 
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Maximum likelihood 

The maximum likelihood method uses standard statistical techniques for inferring 
probability distributions to assign probabilities to particular possible phylogenetic trees. 
The method requires a substitution model to assess the probability of particular mutations; 
roughly, a tree that requires more mutations at interior nodes to explain the observed 
phylogeny will be assessed as having a lower probability. This is broadly similar to the 
maximum-parsimony method, but maximum likelihood allows additional statistical flexibility 
by permitting varying rates of evolution across both lineages and sites. In fact, the method 
requires that evolution at different sites and along different lineages must be statistically 
independent. Maximum likelihood is thus well suited to the analysis of distantly related 
sequences, but because it formally requires search of all possible combinations of tree 
topology and branch length, it is computationally expensive to perform on more than a few 
sequences. 

The "pruning" algorithm, a variant of dynamic programming, is often used to reduce the 
search space by efficiently calculating the likelihood of subtrees. The method calculates 
the likelihood for each site in a "linear" manner, starting at a node whose only descendants 
are leaves (that is, the tips of the tree) and working backwards toward the "bottom" node in 
nested sets. However, the trees produced by the method are only rooted if the substitution 
model is irreversible, which is not generally true of biological systems. The search for the 
maximum-likelihood tree also includes a branch length optimization component that is 
difficult to improve upon algorithmically; general global optimization tools such as the 
Newton-Raphson method are often used. Searching tree topologies defined by likelihood 
has not been shown to be NP-complete, but remains extremely challenging because 
branch-and-bound search is not yet effective for trees represented in this way. 

Bayesian inference 

Bayesian inference can be used to produce phylogenetic trees in a manner closely related 
to the maximum likelihood methods. Bayesian methods assume a prior probability 
distribution of the possible trees, which may simply be the probability of any one tree 
among all the possible trees that could be generated from the data, or may be a more 
sophisticated estimate derived from the assumption that divergence events such as 
speciation occur as stochastic processes. The choice of prior distribution is a point of 
contention among users of Bayesian-inference phylogenetics methods. 

Implementations of Bayesian methods generally use Markov chain Monte Carlo sampling 

algorithms, although the choice of move set varies; selections used in Bayesian 

n 71 
phylogenetics include circularly permuting leaf nodes of a proposed tree at each step L ' 
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and swapping descendant subtrees of a random internal node between two related trees. 
The use of Bayesian methods in phylogenetics has been controversial, largely due to 
incomplete specification of the choice of move set, acceptance criterion, and prior 
distribution in published work. ] 
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Model selection 

Molecular phylogenetics methods rely on a defined substitution model that encodes a 
hypothesis about the relative rates of mutation at various sites along the gene or amino acid 
sequences being studied. At their simplest, substitution models aim to correct for 
differences in the rates of transitions and transversions in nucleotide sequences. The use of 
substitution models is necessitated by the fact that the genetic distance between two 
sequences increases linearly only for a short time after the two sequences diverge from 
each other (alternatively, the distance is linear only shortly before coalescence). The longer 
the amount of time after divergence, the more likely it becomes that two mutations occur at 
the same nucleotide site. Simple genetic distance calculations will thus undercount the 
number of mutation events that have occurred in evolutionary history. The extent of this 
undercount increases with increasing time since divergence, which can lead to the 
phenomenon of long branch attraction, or the misassignment of two distantly related but 
convergently evolving sequences as closely related. The maximum parsimony method is 
particularly susceptible to this problem due to its explicit search for a tree representing a 
minimum number of distinct evolutionary events. 

Types of models 

All substitution models assign a set of weights to each possible change of state represented 
in the sequence. The most common model types are implicitly reversible because they 
assign the same weight to, for example, a G>C nucleotide mutation as to a C>G mutation. 
The simplest possible model, the Jukes-Cantor model, assigns an equal probability to every 
possible change of state for a given nucleotide base. The rate of change between any two 
distinct nucleotides will be one-third of the overall substitution rate. More advanced 
models distinguish between transitions and transversions. The most general possible 
time-reversible model, called the GTR model, has contains six mutation rate parameters. An 
even more generalized model known as the general 12-parameter model breaks 
time-reversibility, at the cost of much additional complexity in calculating genetic distances 
that are consistent among multiple lineages. One possible variation on this theme adjusts 
the rates so that overall GC content - an important measure of DNA double helix stability - 
varies over time. 

Models may also allow for the variation of rates with positions in the input sequence. The 
most obvious example of such variation follows from the arrangement of nucleotides in 
protein-coding genes into three-base codons. If the location of the open reading frame 
(ORF) is known, rates of mutation can be adjusted for position of a given site within a 
codon, since it is known that wobble base pairing can allow for higher mutation rates in the 
third nucleotide of a given codon without affecting the codon's meaning in the genetic 
code. A less hypothesis-driven example that does not rely on ORF identification simply 
assigns to each site a rate randomly drawn from a predetermined distribution, often the 
gamma distribution or log-normal distribution. Finally, a more conservative estimate of 
rate variations known as the covarion method allows autocorrelated variations in rates, so 
that the mutation rate of a given site is correlated across sites and lineages. 
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Choosing the best model 

The selection of an appropriate model is critical for the production of good phylogenetic 
analyses, both because underparameterized or overly restrictive models may produce 
aberrant behavior when their underlying assumptions are violated, and because overly 
complex or overparameterized models are computationally expensive and the parameters 
may be overfit. The most common method of model selection is the likelihood ratio test 
(LRT), which produces a likelihood estimate that can be interpreted as a measure of 
"goodness of fit" between the model and the input data. However, care must be taken in 
using these results, since a more complex model with more parameters will always have a 
higher likelihood than a simplified version of the same model, which can lead to the naive 
selection of models that are overly complex. For this reason model selection computer 
programs will choose the simplest model that is not significantly worse than more complex 
substitution models. A significant disadvantage of the LRT is the necessity of making a 
series of pairwise comparisons between models; it has been shown that the order in which 
the models are compared has a major effect on the one that is eventually selected. 

An alternative model selection method is the Akaike information criterion (AIC), formally an 
estimate of the Kullback-Leibler divergence between the true model and the model being 
tested. It can be interpreted as a likelihood estimate with a correction factor to penalize 
overparameterized models. The AIC is calculated on an individual model rather than a 
pair, so it is independent of the order in which models are assessed. A related alternative, 
the Bayesian information criterion (BIC), has a similar basic interpretation but penalizes 
complex models more heavily. ' 

See also 

List of phylogenetics software 

Cladistics 

PHYLIP 

Phylogenetic comparative methods 

Phylogenetic tree 

Phylogenetics 

Systematics 

Joe Felsenstein 

External links 

PHYLIP [23 , a freely distributed phylogenetic analysis package 

PAUP , a similar analysis package available for purchase 

MrBayes , a program for the Bayesian estimation of phylogeny (software wiki [ ] ) 

T271 

BAli-Phy , a program for simultaneous Bayesian estimation of alignment and 

phylogeny. 

Treefinder , a graphical analysis environment for molecular phylogenetics 

Modeltest , a program for selecting appropriate substitution models for nucleotide 

sequences 

CIPRES: Cyberinfrastructure for Phylogenetic Research [ ] 

[Til 

Phylogenetic inferring on the T-REX server 
List of phylogeny programs 
Phylogeny Algorithms Pseudocode 
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License 

Version 1.2, November 2002 

Copyright (C) 2000,2001,2002 Free Software Foundation, Inc. 
51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA 
Everyone is permitted to copy and distribute verbatim copies 
of this license document, but changing it is not allowed. 

0. PREAMBLE 

The purpose of this License is to make a manual, textbook, or other functional and useful document "free" in the sense of freedom: to assure everyone 
the effective freedom to copy and redistribute it, with or without modifying it, either commercially or noncommercially. Secondarily, this License 
preserves for the author and publisher a way to get credit for their work, while not being considered responsible for modifications made by others. 
This License is a kind of "copyleft", which means that derivative works of the document must themselves be free in the same sense. It complements the 
GNU General Public License, which is a copyleft license designed for free software. 

We have designed this License in order to use it for manuals for free software, because free software needs free documentation: a free program should 
come with manuals providing the same freedoms that the software does. But this License is not limited to software manuals; it can be used for any 
textual work, regardless of subject matter or whether it is published as a printed book. We recommend this License principally for works whose purpose 
is instruction or reference. 

1. APPLICABILITY AND DEFINITIONS 

This License applies to any manual or other work, in any medium, that contains a notice placed by the copyright holder saying it can be distributed under 

the terms of this License. Such a notice grants a world-wide, royalty-free license, unlimited in duration, to use that work under the conditions stated 

herein. The "Document", below, refers to any such manual or work. Any member of the public is a licensee, and is addressed as "you". You accept the 

license if you copy, modify or distribute the work in a way reguiring permission under copyright law. 

A "Modified Version" of the Document means any work containing the Document or a portion of it, either copied verbatim, or with modifications and/or 

translated into another language. 

A "Secondary Section" is a named appendix or a front-matter section of the Document that deals exclusively with the relationship of the publishers or 

authors of the Document to the Document's overall subject (or to related matters) and contains nothing that could fall directly within that overall subject. 

(Thus, if the Document is in part a textbook of mathematics, a Secondary Section may not explain any mathematics.) The relationship could be a matter 

of historical connection with the subject or with related matters, or of legal, commercial, philosophical, ethical or political position regarding them. 

The "Invariant Sections" are certain Secondary Sections whose titles are designated, as being those of Invariant Sections, in the notice that says that the 

Document is released under this License. If a section does not fit the above definition of Secondary then it is not allowed to be designated as Invariant. 

The Document may contain zero Invariant Sections. If the Document does not identify any Invariant Sections then there are none. 

The "Cover Texts" are certain short passages of text that are listed, as Front-Cover Texts or Back-Cover Texts, in the notice that says that the Document 

is released under this License. A Front-Cover Text may be at most 5 words, and a Back-Cover Text may be at most 25 words. 

A "Transparent" copy of the Document means a machine-readable copy, represented in a format whose specification is available to the general public, 

that is suitable for revising the document straightforwardly with generic text editors or (for images composed of pixels) generic paint programs or (for 

drawings) some widely available drawing editor, and that is suitable for input to text formatters or for automatic translation to a variety of formats 

suitable for input to text formatters. A copy made in an otherwise Transparent file format whose markup, or absence of markup, has been arranged to 

thwart or discourage subseguent modification by readers is not Transparent. An image format is not Transparent if used for any substantial amount of 

text. A copy that is not "Transparent" is called "Opaque". 

Examples of suitable formats for Transparent copies include plain ASCII without markup, Texinfo input format, LaTeX input format, SGML or XML using 

a publicly available DTD, and standard-conforming simple HTML, PostScript or PDF designed for human modification. Examples of transparent image 

formats include PNG, XCF and JPG. Opague formats include proprietary formats that can be read and edited only by proprietary word processors, SGML 

or XML for which the DTD and/or processing tools are not generally available, and the machine-generated HTML, PostScript or PDF produced by some 

word processors for output purposes only. 

The "Title Page" means, for a printed book, the title page itself, plus such following pages as are needed to hold, legibly, the material this License 

requires to appear in the title page. For works in formats which do not have any title page as such, "Title Page" means the text near the most prominent 

appearance of the work's title, preceding the beginning of the body of the text. 

A section "Entitled XYZ" means a named subunit of the Document whose title either is precisely XYZ or contains XYZ in parentheses following text that 

translates XYZ in another language. (Here XYZ stands for a specific section name mentioned below, such as "Acknowledgements", "Dedications", 

"Endorsements", or "History".) To "Preserve the Title" of such a section when you modify the Document means that it remains a section "Entitled XYZ" 

according to this definition. 

The Document may include Warranty Disclaimers next to the notice which states that this License applies to the Document. These Warranty Disclaimers 

are considered to be included by reference in this License, but only as regards disclaiming warranties: any other implication that these Warranty 

Disclaimers may have is void and has no effect on the meaning of this License. 

2. VERBATIM COPYING 

You may copy and distribute the Document in any medium, either commercially or noncommercially, provided that this License, the copyright notices, 
and the license notice saying this License applies to the Document are reproduced in all copies, and that you add no other conditions whatsoever to 
those of this License. You may not use technical measures to obstruct or control the reading or further copying of the copies you make or distribute. 
However, you may accept compensation in exchange for copies. If you distribute a large enough number of copies you must also follow the conditions in 
section 3. 
You may also lend copies, under the same conditions stated above, and you may publicly display copies. 

3. COPYING IN QUANTITY 

If you publish printed copies (or copies in media that commonly have printed covers) of the Document, numbering more than 100, and the Document's 
license notice reguires Cover Texts, you must enclose the copies in covers that carry, clearly and legibly, all these Cover Texts: Front-Cover Texts on the 
front cover, and Back-Cover Texts on the back cover. Both covers must also clearly and legibly identify you as the publisher of these copies. The front 
cover must present the full title with all words of the title equally prominent and visible. You may add other material on the covers in addition. Copying 
with changes limited to the covers, as long as they preserve the title of the Document and satisfy these conditions, can be treated as verbatim copying in 
other respects. 

If the reguired texts for either cover are too voluminous to fit legibly, you should put the first ones listed (as many as fit reasonably) on the actual cover, 
and continue the rest onto adjacent pages. 

If you publish or distribute Opaque copies of the Document numbering more than 100, you must either include a machine-readable Transparent copy 
along with each Opague copy, or state in or with each Opague copy a computer-network location from which the general network-using public has 
access to download using public-standard network protocols a complete Transparent copy of the Document, free of added material. If you use the latter 
option, you must take reasonably prudent steps, when you begin distribution of Opague copies in guantity, to ensure that this Transparent copy will 
remain thus accessible at the stated location until at least one year after the last time you distribute an Opaque copy (directly or through your agents or 
retailers) of that edition to the public. 

It is requested, but not required, that you contact the authors of the Document well before redistributing any large number of copies, to give them a 
chance to provide you with an updated version of the Document. 

4. MODIFICATIONS 

You may copy and distribute a Modified Version of the Document under the conditions of sections 2 and 3 above, provided that you release the Modified 
Version under precisely this License, with the Modified Version filling the role of the Document, thus licensing distribution and modification of the 
Modified Version to whoever possesses a copy of it. In addition, you must do these things in the Modified Version: 

1 . Use in the Title Page (and on the covers, if any) a title distinct from that of the Document, and from those of previous versions (which should, if there 
were any, be listed in the History section of the Document). You may use the same title as a previous version if the original publisher of that version 
gives permission. 

2. List on the Title Page, as authors, one or more persons or entities responsible for authorship of the modifications in the Modified Version, together 
with at least five of the principal authors of the Document (all of its principal authors, if it has fewer than five), unless they release you from this 
requirement. 

3. State on the Title page the name of the publisher of the Modified Version, as the publisher. 

4. Preserve all the copyright notices of the Document. 

5. Add an appropriate copyright notice for your modifications adjacent to the other copyright notices. 

6. Include, immediately after the copyright notices, a license notice giving the public permission to use the Modified Version under the terms of this 
License, in the form shown in the Addendum below. 

7. Preserve in that license notice the full lists of Invariant Sections and reguired Cover Texts given in the Document's license notice. 
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8. Include an unaltered copy of this License. 

9. Preserve the section Entitled "History", Preserve its Title, and add to it an item stating at least the title, year, new authors, and publisher of the 
Modified Version as given on the Title Page. If there is no section Entitled "History" in the Document, create one stating the title, year, authors, and 
publisher of the Document as given on its Title Page, then add an item describing the Modified Version as stated in the previous sentence. 

10. Preserve the network location, if any, given in the Document for public access to a Transparent copy of the Document, and likewise the network 
locations given in the Document for previous versions it was based on. These may be placed in the "History" section. You may omit a network 
location for a work that was published at least four years before the Document itself, or if the original publisher of the version it refers to gives 
permission. 

11. For any section Entitled "Acknowledgements" or "Dedications", Preserve the Title of the section, and preserve in the section all the substance and 
tone of each of the contributor acknowledgements and/or dedications given therein. 

12. Preserve all the Invariant Sections of the Document, unaltered in their text and in their titles. Section numbers or the equivalent are not considered 
part of the section titles. 

13. Delete any section Entitled "Endorsements". Such a section may not be included in the Modified Version. 

14. Do not retitle any existing section to be Entitled "Endorsements" or to conflict in title with any Invariant Section. 

15. Preserve any Warranty Disclaimers. 

If the Modified Version includes new front-matter sections or appendices that qualify as Secondary Sections and contain no material copied from the 

Document, you may at your option designate some or all of these sections as invariant. To do this, add their titles to the list of Invariant Sections in the 

Modified Version's license notice. These titles must be distinct from any other section titles. 

You may add a section Entitled "Endorsements", provided it contains nothing but endorsements of your Modified Version by various parties-for example, 

statements of peer review or that the text has been approved by an organization as the authoritative definition of a standard. 

You may add a passage of up to five words as a Front-Cover Text, and a passage of up to 25 words as a Back-Cover Text, to the end of the list of Cover 

Texts in the Modified Version. Only one passage of Front-Cover Text and one of Back-Cover Text may be added by (or through arrangements made by) 

any one entity. If the Document already includes a cover text for the same cover, previously added by you or by arrangement made by the same entity 

you are acting on behalf of, you may not add another; but you may replace the old one, on explicit permission from the previous publisher that added the 

old one. 

The author(s) and publisher(s) of the Document do not by this License give permission to use their names for publicity for or to assert or imply 

endorsement of any Modified Version. 

5. COMBINING DOCUMENTS 

You may combine the Document with other documents released under this License, under the terms defined in section 4 above for modified versions, 

provided that you include in the combination all of the Invariant Sections of all of the original documents, unmodified, and list them all as Invariant 

Sections of your combined work in its license notice, and that you preserve all their Warranty Disclaimers. 

The combined work need only contain one copy of this License, and multiple identical Invariant Sections may be replaced with a single copy. If there are 

multiple Invariant Sections with the same name but different contents, make the title of each such section unique by adding at the end of it, in 

parentheses, the name of the original author or publisher of that section if known, or else a unique number. Make the same adjustment to the section 

titles in the list of Invariant Sections in the license notice of the combined work. 

In the combination, you must combine any sections Entitled "History" in the various original documents, forming one section Entitled "History"; likewise 

combine any sections Entitled "Acknowledgements", and any sections Entitled "Dedications". You must delete all sections Entitled "Endorsements." 

6. COLLECTIONS OF DOCUMENTS 

You may make a collection consisting of the Document and other documents released under this License, and replace the individual copies of this 
License in the various documents with a single copy that is included in the collection, provided that you follow the rules of this License for verbatim 
copying of each of the documents in all other respects. 

You may extract a single document from such a collection, and distribute it individually under this License, provided you insert a copy of this License into 
the extracted document, and follow this License in all other respects regarding verbatim copying of that document. 

7. AGGREGATION WITH INDEPENDENT WORKS 

A compilation of the Document or its derivatives with other separate and independent documents or works, in or on a volume of a storage or distribution 
medium, is called an "aggregate" if the copyright resulting from the compilation is not used to limit the legal rights of the compilation's users beyond 
what the individual works permit. When the Document is included in an aggregate, this License does not apply to the other works in the aggregate which 
are not themselves derivative works of the Document. 

If the Cover Text requirement of section 3 is applicable to these copies of the Document, then if the Document is less than one half of the entire 
aggregate, the Document's Cover Texts may be placed on covers that bracket the Document within the aggregate, or the electronic equivalent of covers 
if the Document is in electronic form. Otherwise they must appear on printed covers that bracket the whole aggregate. 

8. TRANSLATION 

Translation is considered a kind of modification, so you may distribute translations of the Document under the terms of section 4. Replacing Invariant 
Sections with translations requires special permission from their copyright holders, but you may include translations of some or all Invariant Sections in 
addition to the original versions of these Invariant Sections. You may include a translation of this License, and all the license notices in the Document, 
and any Warranty Disclaimers, provided that you also include the original English version of this License and the original versions of those notices and 
disclaimers. In case of a disagreement between the translation and the original version of this License or a notice or disclaimer, the original version will 
prevail. 

If a section in the Document is Entitled "Acknowledgements", "Dedications", or "History", the requirement (section 4) to Preserve its Title (section 1) will 
typically require changing the actual title. 

9. TERMINATION 

You may not copy, modify, sublicense, or distribute the Document except as expressly provided for under this License. Any other attempt to copy, modify, 
sublicense or distribute the Document is void, and will automatically terminate your rights under this License. However, parties who have received 
copies, or rights, from you under this License will not have their licenses terminated so long as such parties remain in full compliance. 

10. FUTURE REVISIONS OF THIS LICENSE 

The Free Software Foundation may publish new, revised versions of the GNU Free Documentation License from time to time. Such new versions will be 
similar in spirit to the present version, but may differ in detail to address new problems or concerns. See http://www.gnu.org/copyleft/. 
Each version of the License is given a distinguishing version number. If the Document specifies that a particular numbered version of this License "or 
any later version" applies to it, you have the option of following the terms and conditions either of that specified version or of any later version that has 
been published (not as a draft) by the Free Software Foundation. If the Document does not specify a version number of this License, you may choose any 
version ever published (not as a draft) by the Free Software Foundation. 

How to use this License for your documents 

To use this License in a document you have written, include a copy of the License in the document and put the following copyright and license notices 
just after the title page: 

Copyright (c) YEAR YOUR NAME. 

Permission is granted to copy, distribute and/or modify this document 

under the terms of the GNU Free Documentation License, Version 1.2 

or any later version published by the Free Software Foundation; 

with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. 

A copy of the license is included in the section entitled "GNU 

Free Documentation License". 
If you have Invariant Sections, Front-Cover Texts and Back-Cover Texts, replace the "with. ..Texts." line with this: 

with the Invariant Sections being LIST THEIR TITLES, with the 

Front-Cover Texts being LIST, and with the Back-Cover Texts being LIST. 
If you have Invariant Sections without Cover Texts, or some other combination of the three, merge those two alternatives to suit the situation. 
If your document contains nontrivial examples of program code, we recommend releasing these examples in parallel under your choice of free software 
license, such as the GNU General Public License, to permit their use in free software. 



